Real-Time Performance of Windows XP Embedded

Transcription

Real-Time Performance of Windows XP Embedded
Real-Time Performance of
Windows XP Embedded
Andreas Harnesk andreas.harnesk@gmail.com
David Tenser tenser@gmail.com
April 30, 2006
ABB Corporate Research
Mälardalen University
Supervisor: Henrik Johansson
henrik.n.johansson@se.abb.com
Supervisor: Frank Lüders
frank.luders@mdh.se
Advanced Industrial
Communication Group
Västerås, Sweden
Department of Computer
Science and Electronics
Västerås, Sweden
Abstract
A business unit of ABB, providing embedded system based products for the automation industry, today runs their real-time core on
dedicated hardware, isolated from any extra functionality. To stay
competitive in the industry, development costs need to be reduced.
One possible solution is to run both the real-time core and the extra functonality on the same hardware, and switch to Windows XP
Embedded as the operating system.
In this report, the characteristics of XP as a real-time operating system are revealed by investigating how XP works under the hood. Two
types of real-time implementations are evaluated; one implemented as a
normal user-thread, and another implemented in a device driver. Tests
are conducted, measuring execution times of the implementations.
The results show that a device driver implementation is more deterministic than a user-mode implementation. While the specic tests
conducted yielded execution times withas it appearedlimited variation, no hard guarantees about an absolute worst case execution time
can be made. However, the tests show that the probability of execution
times exceeding those measured are very unlikely. Thus, this indicates
that XP might be suitable as a soft real-time operating system under
certain controlled conditions.
1
Sammanfattning
En aärsenhet på ABB som tillverkar produkter baserad på integrerade system för automationsindustrin kör idag sin realtidskärna
på separat hårdvara, isolerad från övrig funktionalitet. För att förbli
konkurrenskraftig i industrin måste utvecklingskostnaderna minskas.
En möjlig lösning kan vara att köra både realtidskärnan och övrig
funktionalitet på samma hårdvara, samt att byta operativsystem till
Windows XP Embedded.
I denna rapport avslöjas XP:s karakteristik som realtidsoperativsystem genom att undersöka hur XP fungerar under huven. Två typer av
realtidsimplementationer testas; en implementerad som en normal användartråd och den andra implementeras i en drivrutin. Tester med
avseende på exekveringstider genomförs.
Resultaten visar att en drivrutinsimplementation är mer deterministisk än en användartrådsimplementation. Även om de specika
testerna medgav exekveringstider med begränsad variation går det inte
att ge några hårda garantier för en absolut värstafallsexekveringstid.
Testerna visar dock att sannolikheten för exekveringstider överskridande de uppmätta är ytterst osannolika. Detta indikerar att XP
kan fungera som ett mjukt realtidsoperativsystem under kontrollerade
förutsättningar.
2
Acknowledgements
First and foremost we would like to thank our thesis supervisors Henrik
Johansson and Frank Lüders, who have shown a large and consistent interest
throughout the project. Our numerous scientic discussions and their many
constructive comments have greatly improved this work.
Thanks to Roger Melander for giving us a deeper insight in how task
interrupts work in a processor, and also for being such a nice person.
A special thanks to Jimmy Kjellsson for helping us setting up the oscilloscope environment.
Last but not least, thanks to Dr. Tomas Lennvall for providing us with
lots of constructive feedback.
Our experience at ABB Corporate Research has been nothing but positive and the people working there truly are top notch.
3
Contents
1 Introduction
1.1
Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . .
2 Real-Time Concepts
2.1
2.2
2.3
2.4
2.5
2.6
Hard and Soft Real-Time . . . . .
Tasks, Processes and Threads . . .
Shared Resources and Semaphores
Priorities . . . . . . . . . . . . . .
Scheduling . . . . . . . . . . . . . .
Time Analysis . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 RTOS Requirements
3.1
3.2
3.3
3.4
3.5
Requirement
Requirement
Requirement
Requirement
Requirement
1:
2:
3:
4:
5:
Preemtible and Multitasking . .
Task Priorities . . . . . . . . . .
Predictable Task Synchronization
Avoid Priority Inversion . . . . .
Predictable Temporal Behavior .
4 Windows XP Embedded
4.1
4.2
4.3
4.4
4.5
4.6
Background . . . . . . . . . . . . . . .
System Structure Overview . . . . . .
4.2.1 Hardware Abstraction Layer . .
4.2.2 Kernel . . . . . . . . . . . . . .
4.2.3 Device Drivers . . . . . . . . .
4.2.4 Executive . . . . . . . . . . . .
Thread Scheduling and Priority Levels
Interrupt Handling . . . . . . . . . . .
4.4.1 Interrupt Service Routine . . .
4.4.2 Deferred Procedure Call . . . .
4.4.3 Asynchronous Procedure Call .
Memory Management . . . . . . . . .
4.5.1 Kernel Page Pools . . . . . . .
4.5.2 Memory Manager . . . . . . . .
Windows Driver Model . . . . . . . . .
4.6.1 I/O Request Packets . . . . . .
4.6.2 Driver Types . . . . . . . . . .
4.6.3 Device Objects . . . . . . . . .
4.6.4 I/O Request Processing . . . .
4.6.5 Floating-Point Operations . . .
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
. . . . . . .
Mechanisms
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
11
11
11
12
12
13
13
14
14
15
15
15
16
17
17
18
18
18
19
19
19
20
22
22
22
22
24
24
24
25
25
26
26
28
5 Real-Time Aspects of XP
5.1
5.2
Design Issues That Limit XP's Use As a RTOS . . . . . . . .
Using XP as a RTOS . . . . . . . . . . . . . . . . . . . . . . .
6 Extensions
6.1
6.2
RTX . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Architecture . . . . . . . . . . . . . . . . . . .
6.1.2 Software Development . . . . . . . . . . . . .
6.1.3 Does RTX Meet the RTOS Requirements? . .
INtime . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Architecture . . . . . . . . . . . . . . . . . . .
6.2.2 APIs . . . . . . . . . . . . . . . . . . . . . . .
6.2.3 Software Development . . . . . . . . . . . . .
6.2.4 Does INtime Meet the RTOS Requirements?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
30
31
33
33
33
35
35
36
36
37
37
38
7 Related Work
39
8 Problem Description
42
7.1
7.2
7.3
8.1
8.2
8.3
User Level Thread Implementation . . . . . . . . . . . . . . .
Driver Based Implementation . . . . . . . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Suggested Model . . . . . . . . . . . . . . . . . . . . . . . . .
Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9 Methodology
9.1
9.2
9.3
9.4
9.5
9.6
Conducted Tests . . . . . . . . . . .
9.1.1 User-Thread Implementation
9.1.2 Driver Implementation . . . .
Test System . . . . . . . . . . . . . .
9.2.1 System Services . . . . . . . .
Execution Time Measurement . . . .
9.3.1 Performance Counter . . . . .
9.3.2 Time-Stamp Counter . . . . .
9.3.3 Oscilloscope . . . . . . . . . .
System Load Conditions . . . . . . .
9.4.1 Idle . . . . . . . . . . . . . .
9.4.2 CPU Load . . . . . . . . . . .
9.4.3 Graphics Load . . . . . . . .
9.4.4 HDD Load . . . . . . . . . .
9.4.5 Network Load . . . . . . . . .
9.4.6 Stress . . . . . . . . . . . . .
Test Names . . . . . . . . . . . . . .
Additional Tests . . . . . . . . . . .
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
39
40
43
44
44
45
45
46
47
47
48
48
48
50
51
52
52
52
52
52
53
53
53
53
10 Results
10.1 TSC Measurement Results
10.1.1 UserIdle . . . . . .
10.1.2 UserCPU . . . . .
10.1.3 UserGraphics . . .
10.1.4 UserHDD . . . . .
10.1.5 UserNetwork . . .
10.1.6 UserStress . . . . .
10.2 Oscilloscope Test Results
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11 Conclusions
11.1
11.2
11.3
11.4
11.5
11.6
11.7
Better Determinism Than Reported In Previous Work
Higher Task Priority Yields Better Determinism . . . .
Driver Faster Than User-Mode . . . . . . . . . . . . .
Task Interruption Can Occur Anywhere . . . . . . . .
Small Dierence Between Normal and Prioritized DPC
Algorithm Slower in Kernel-Mode . . . . . . . . . . . .
No Guarantees Can Be Given . . . . . . . . . . . . . .
12 Future Work
12.1
12.2
12.3
12.4
Use of an Ethernet Based Protocol for Communication
Modify Interrupt Handling . . . . . . . . . . . . . . . .
Run the Tests on XPE . . . . . . . . . . . . . . . . . .
Evaluate Extensions . . . . . . . . . . . . . . . . . . .
A Oscilloscope Test Results
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
54
54
54
54
55
55
57
57
57
60
60
60
61
61
61
62
62
63
63
63
64
64
66
6
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Simplied Windows architecture[30]. . . . . . . . . . . . . . .
The full range of priority levels in XP[3]. . . . . . . . . . . . .
The virtual memory for two processes. The gray areas represent shared memory. . . . . . . . . . . . . . . . . . . . . . . .
The ow of I/O requests through the system. . . . . . . . . .
Sketch of the implementation suggested by the ABB business
unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Full event cycle of the user-thread implementation. . . . . . .
Sequential time diagram of the user-thread implementation. .
Full event cycle of the driver implementation. . . . . . . . . .
Sequential time diagram of the driver implementation. . . . .
Measured start-stop time versus measurement number for the
Performance Counter. (a) With Sleep(), (b) Without Sleep().
Measured start-stop time versus measurement number for the
Time-Stamp Counter, (a) with Sleep(), (b) without Sleep(). .
UserIdle algorithm execution time. (a) Scatter plot, (b) Time
distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
UserCPU algorithm execution time. (a) Scatter plot, (b) Time
distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
UserGraphics algorithm execution time. (a) Scatter plot, (b)
Time distribution. . . . . . . . . . . . . . . . . . . . . . . . .
UserHDD algorithm execution time. (a) Scatter plot, (b)
Time distribution. . . . . . . . . . . . . . . . . . . . . . . . .
UserNetwork algorithm execution time. (a) Scatter plot, (b)
Time distribution. . . . . . . . . . . . . . . . . . . . . . . . .
UserStress algorithm execution time. (a) Scatter plot, (b)
Time distribution. . . . . . . . . . . . . . . . . . . . . . . . .
Suggested model for interrupt interception. . . . . . . . . . .
18
21
23
26
43
46
46
47
48
49
50
55
56
56
57
58
58
64
List of Tables
1
2
3
4
The priority levels in XP. . . . . . . . . . . . . . . . .
Measured start-stop time in µs for the PeC and TSC. .
Test names used throughout the report. . . . . . . . .
Algorithm execution time in µs for the TSC tests. . . .
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
51
53
54
Glossary
APC
API
BIOS
COTS
CPU
DDK
DPC
EDF
FDO
FiDO
FIFO
FTP
GPOS
GUI
HAL
HDD
I/O
IDE
IDT
IDTR
IRP
IRQ
IRQL
ISR
OS
PC
PeC
PCI
PDO
RAM
RT-HAL
RTOS
RTSS
SRI
TSC
WCET
WDM
XPE
XP
Asynchronous Procedure Call
Application Program Interface
Basic Input/Output System
Commercial o the Shelf
Central Processing Unit
Microsoft Windows Driver Development Kit
Deferred Procedure Call
Earliest Deadline First
Functional Device Objects
Filter Device Objects
First In First Out
File Transfer Protocol
General Purpose Operating System
Graphical User Interface
Hardware Abstraction Layer
Hard Disk Drive
Input/Output
Integrated Development Environment
Interrupt Descriptor Table
Interrupt Descriptor Table Register
Interrupt Request Package
Interrupt Request
Interrupt Request Level
Interrupt Service Routine
Operating System
Personal Computer
Performance Counter
Peripheral Component Interconnect
Physical Device Objects
Random Access Memory
Real-Time Hardware Abstraction Layer
Real-Time Operating System
Real-Time Sub-System
Service Request Interrupt
Time-Stamp Counter
Worst-Case Execution Time
Windows Driver Model
Microsoft Windows XP Embedded
Microsoft Windows XP Family (including XPE)
8
1 Introduction
Within ABB, and the automation industry in general, embedded systems are
found in virtually every product and system. More and more functionality
are being built in and the performance requirements become tougher.
Today it is very common that embedded systems run with the support
oered by a real-time operating system (RTOS) in order to meet the requirements enforced by, for example, the industry process being controlled.
These RTOSs are often high performing and quite reliable. However, this
often comes to the price of high cost, high complexity, and unfriendly usage.
In addition, it is not rare that such systems require special development tools
and environments, and sometimes also special platforms. Since cost and usability are two important factors for the industry, this can be a problem in
certain business areas.
In the past, automation systems were usually developed specically for
one or possibly a few products, which made the development extremely expensive. To decrease development costs in general, functionality is often
grouped into independent, reusable and well dened solutions. This has
been possible thanks to standardization from organizations such as IEEE,
W3C, ISO, and IEC, but also because of de-facto standards like Microsoft
Windows and the .NET framework.
1.1 Microsoft Windows
The Windows XP family of operating systems (OS) dominate the personal
computer OS market[30]. It is a general purpose operating system (GPOS)
designed to optimize throughput and average performance[34]. Because of its
popularity, there is a strong interest in using XP as an embedded real-time
system for the automation industry.
There are several reaons why the interest in XP as a RTOS is so strong.
The most signicant possible benets are:
• Personal computer (PC) hardware is much cheaper than traditional
embedded systems in the automation industry. For example, cheap
Ethernet adapters could be used.
• Functionality can be developed using rapid prototyping and the .NET
framework.
• A vast amount of software, development tools, and COTS components
(e.g. ActiveX) are available to the developers[24].
• It is arguably easier to develop software for XP than for RTOSs because of the familiar integrated development environments (IDE) like
Microsoft Visual Studio.
9
• Customers are inherently familiar with the user interface, since XP is
a de-facto standard in the oce world[30].
This leads to the key question: Under which conditions can XP be used as
a RTOS? This report tries to answer this question by investigating how XP
works under the hood, and describing central functionality and mechanisms
in particular those aecting real-time performanceof XP.
10
2 Real-Time Concepts
Before covering the requirements of a RTOS, the general real-time terms and
concepts will be introduced. A real-time system is dened as a system where
correct behavior not only depends on an error free result, but also on when
the result is delivered[23].
2.1 Hard and Soft Real-Time
If the real-time system fails to complete the calculation within a dened
time frame, it is considered a system failure. The eect of missing a deadline
varies from dierent applications, and real-time systems are often divided
into two separate classes, depending on how critical a missed deadline is. In
a hard real-time system, deadlines must be met at all times, and a missed
deadline could lead to catastrophic results[21]. An example of a hard realtime system is the steering systems in an airplane, where a missed deadline
during landing could result in a crash. A calculated result delivered after a
deadline is considered useless in a hard real-time system.
Soft real-time systems[7], however, are allowed to miss deadlines sometimes, but it will usually result in a performance degradation[21]. An example of a soft real-time system is a DVD-player, where missed deadlines
during decoding could result in frame skips, leading to poor quality rather
than failure.
2.2 Tasks, Processes and Threads
All real-time systems consist of tasks [7]. A task can be seen as a sequence
of method executions. There are two types of tasks, known as periodic and
nonperiodic tasks. Just as it sounds, periodic tasks are executed periodically,
for example every 20 milliseconds. Nonperiodic tasks, also known as event
triggered tasks, are executed when an event occurs[23]. Periodic tasks are
often used for sensor reading, actuator control, and other time critical events,
while nonperiodic tasks are better suited for events that are less common,
for example user interaction.
A process is an executing program, including the current values of the
program counter, registers, and variables. The central processing unit (CPU)
rapidly switches from process to process, running each for a short period of
time. At any instant of time, the CPU is running only one process, but in
the course of a longer period of time, it may run several processes. This
technique, giving the illusion of parallelism, is known as multitasking [31].
The actual switch of actively running process is known as a context switch.
Each process has an address space and at least one thread of execution.
The thread has a program counter, keeping track of the instruction to execute
next, along with registers, and a stack. In modern operating systems, a
11
process can have more than one thread, all sharing the same address space.
Switching of actively running thread in a process is also called a context
switch. However, a context switch within the same process is much faster,
since the address space for the process remains unchanged. This is one of
the most signicant benets of multithreading OSs.
Both processes and threads can be seen as dierent types of tasks.
2.3 Shared Resources and Semaphores
A shared resource is a resource used by several tasks. It can be anything
from network access to a global variable used for task synchronisation. To
ensure deterministic behavior of a real-time application, the usage of shared
resources may need to be protected in some cases. More specically, only
one task should be able to access a shared resource at a time[31]. This
protection mechanism is know as a critical section. For example, if a linked
list is used as a shared resource, only one task can be allowed to access it
during updates (write operations), since iteration of a list being written to
can result in pointer errors.
This protection of resources is often realized using semaphores [23]. Put
simply, a task has exclusive rights to a resource if the task has locked the
semaphore. When a semaphore is locked, any other task requesting access to
the resource is blocked during the execution of the critical section. When a
task has left the critical section, the semaphore is unlocked and the blocked
task can acquire it instead. If multiple tasks are waiting for the semaphore,
dierent approaches can be taken to determine which task should be granted
the semaphore[31].
2.4 Priorities
The concept of priority levels is important in a real-time system. All tasks
are given a priority level, which determines the exection order and time-share
of each task in a system. A low priority task is interrupted if a task with
higher priority wants to execute during the same timeframe.
A common priority problem in real-time systems occurs when shared
resources are used between tasks with dierent priorities. In the following
example, a system with three tasks and a semaphore is used. The tasks have
the priority levels of high, normal, and low. At rst, the high and normal
priority tasks are idle and the low priority task runs and immediately locks
the semaphore. After a while, the high priority task is ready to run and wants
to use the semaphore, but since it is already locked, this task is blocked until
the low prioritized task is nished with the semaphore. In the meantime,
the normal priority task is ready to execute. Because it has a higher priority
than the low priority task, it gets to execute instead and does so for an
arbitrary long time. During this time, the highest priority task has to wait
12
for the blocked semaphore owned by the low priority task, which in turn
cannot execute since a higher priority task is running, even though it has
the highest priority. In other words, the priority of the tasks is inverted; a
phenomenom called priority inversion. Mechanisms used to avoid priority
inversions will be discussed in Section 3.4.
2.5 Scheduling
The execution order of tasks is decided by a scheduler. Two kinds of scheduling algorithms exist: o-line and on-line scheduling[23]. An o-line scheduler makes a schedule prior to code execution[7]. Because of this, an o-line
scheduler can guarantee that no deadlines are missed, since it has complete
knowledge of the system, assuming the timing constraint of each task is correct. However, this type of scheduling algorithm allows no event triggered
tasks, since the knowledge of when an event occurs is not known before
runtime.
To allow event triggered tasks in a real-time system, an on-line scheduler
needs to be used. The main drawback with an on-line scheduler is that
deadline guarantee can only be given under certain controlled conditions, for
example, if no event triggered tasks exist[20].
2.6 Time Analysis
Time analysis is an important subject in real-time systems. Normally, developers are interested in the average execution time, worst-case execution
time (WCET), and execution time variation[7]. WCET and variation are
the most interesting ones of the three. The WCET is the longest time a task
will take to execute. If this time is known, it is possible to design the system
in such a way that the deadlines are never missed.
The execution time variation is also important since a low variation
means a better utilization of the hardware. Since most real-time systems
are embedded and have limited hardware capacity[7], both determinism and
memory eciency are important to keep hardware costs down.
13
3 RTOS Requirements
Based on the denition of a real-time system, the results of a RTOS should
be given within a predened time frame. The RTOS needs to be time deterministic to guarantee the fulllment of this requirement. Although time
deterministic behavior is important in a RTOS it is not the only requirement
for an OS to be considered a RTOS. The following requirements need to be
fullled[34, 32, 19]:
• The OS has to be multitasked and preemptible.
• The notion of task priority has to exist.
• The OS has to support predictable task synchronization mechanisms.
• The OS must support a system for avoiding priority inversion.
• The OS must have predictable temporal behavior.
Because of the vague denition of a soft real-time system (a system allowed to occasionally miss deadlines), no denition of a RTOS can be based
on what is required by soft real-time systems. As Timmerman says in [34]:
...the term `real-time' is often misused to indicate a fast system. And fast
can then be seen as `should meet timing deadlines', thus meaning a soft
real-time system.. In other words, a GPOS would be considered a RTOS if
soft real-time characteristics were sucient. In this report, the term RTOS
means a OS suitable to run hard real-time systems.
3.1 Requirement 1: Preemtible and Multitasking
According to the rst requirement, the OS must be multitasked. Tasks can
be implemented as both processes and threads in the same system. Since
all threads in a process share the same address space, creating, destructing,
and switching threads are many times faster then the same operations on
processes[31]. Multithreaded OSs are therefore preferred over those that are
just multitasked.
According to [34]: ...[The] scheduler should be able to preempt any
thread in the system and give the resource to the thread that needs it most.
The OS (and the hardware architecture) should also allow multiple levels
of interrupts to enable preemption at the interrupt level. In other words,
a preemtible system must be capable of preempting a thread at any time
during execution. Almost all OSs are multitasked, multithreaded, and oer
preemption. However, most GPOSs do not allow the kernel to be preempted.
Because of this limitation, a high-priority task cannot preempt a kernel call
made by a low-priority task.[19].
14
3.2 Requirement 2: Task Priorities
The notion of task priorities needs to exist in order to have some predictability of task execution order and to ensure that the most critical tasks get to
run rst. There are many dierent scheduling algorithms available to make
this possible. The optimal solution for dynamic priorities (priorities assigned
during runtime) is called earliest deadline rst (EDF) and lets the task with
the earliest deadline execute. But since complete knowledge of the task execution needs to be known in advance, this algorithm is not suitable in event
triggered systems[23]. Although tasks in a system running EDF scheduling
do not have priority levels assigned during system design, all tasks are still
eectively prioritized according to the earliest deadline. Rate monotonic is
the optimal algorithm for system with static prioritized task (task priority
is decided in advance). One of the major drawbacks with this scheduling
algorithm is the unreliable requirement that all tasks executes without any
interaction[23].
3.3 Requirement 3: Predictable Task Synchronization Mechanisms
It is unlikely that tasks in a RTOS execute independently of each other. Because of this, a RTOS needs a predictable synchronization between tasks[23].
By using shared resources guarded by locks, safe interprocess/thread communication can be guaranteed. In a RTOS, this locking mechanism needs to
be time deterministic.
3.4 Requirement 4: Avoid Priority Inversion
Priority inversion is a classic real-time problem and must be handled in a
RTOS. There is no way to eliminate priority inversion when shared resources
and priority levels are used[19], which are both requirements of a RTOS. A
RTOS needs to have a system for minimizing the time of the inversion.
One solution for this problem is known as a shared resource protocol, which
determines rules for accessing shared resources.
One of the simplest and most widely used shared resource protocol is
called priority inheritance protocol. It reduces the blocking time by giving
the low priority task the same priority as the blocked task waiting for the
semaphore. To reduce the blocking time even more, a task cannot have a
semaphore locked when execution is done. The downside of this protocol is
that a high prioritized task can be blocked by several low prioritized tasks[23].
For example, if a high prioritized task needs two semaphores to execute and
both of them are locked when execution starts, the high prioritized task must
wait until both low prioritized tasks have released their semaphores before
execution can start.
15
3.5 Requirement 5: Predictable Temporal Behavior
The nal requirement states that the system activities (system calls, task
switching, interrupt latency, and interrupt masking ) should have predicable
temporal behavior. Some papers argue that predicable temporal behavior is
not enough and that timing constraints even should be given by the RTOS
manufacturer. Also, system interrupt levels and device driver interrupt request levels need to be known by the developer of the real-time system[34].
Interrupts are described in Section 4.4.
16
4 Windows XP Embedded
In the previous section, the basic concepts of real-time systems were introduced, along with a list of requirements an RTOS needs to fulll. This section
introduces Windows XP Embedded (XPE) and explains how its relevant OS
mechanisms work.
Because XPE is a componentized version of Windows XP Professional
(XP), all technical operating system details for XP, such as thread priorities, scheduling algorithms, and inter-process communication also apply to
XPE[37]. Applications designed for XP can run without modications on
XPE, as long as the required libraries for the application are installed (for
example, a .NET application will obviously need the .NET framework)[37].
Furthermore, the same driver model (WDM) is used, which makes all device
drivers for XP available to the embedded system[36].
4.1 Background
Most of the previous research on real-time applications and Windows has
been based on Windows NT 4.0. There are several reasons for this:
• Windows NT was designed from the ground up as a 32-bit operating system with reliability, security, and performance as its primary
goals[8]. This means NT was considered a new technology, which incidentally is what the letters NT stand for[31].
• Windows NT 4.0 was the rst version of NT that sported the popular
user interface from Windows 95, which made it easier for companies to
migrate to it; and many of them did so[8, 31].
• Since NT 4.0, the kernel has not changed in terms of real-time characteristics. The scheduling algorithm, thread priorities, and interrupt
routines have remained the same throughout the dierent versions of
NT[30]. This means that the limitations of using the platform as a
RTOS are already well known.
Because this report is examining the real-time characteristics of XPE, it is
important to know that XP, which XPE derives from, is part of the Windows
NT family of operating systems and its formal version number is NT 5.1. As
stated above, XP also uses the same scheduling algorithms and interrupt handling routines as NT 4.0 and NT 5.0 (commonly known as Windows 2000).
This makes the previous research (see [27, 24, 34]) highly relevant for this
report, even though it was performed on NT 4.0.
From here on, the term XP will be used for information applying to both
Windows XP and Windows XP Embedded. The term XPE will be used only
for information applying specically to Windows XP Embedded.
17
System
support
processes
Service
processes
User
applications
Environment
subsystems
Subsystem DLLs
User-mode
Kernel-mode
Executive
Kernel
Windowing
and graphics
Device drivers
Hardware abstraction layer (HAL)
Figure 1: Simplied Windows architecture[30].
4.2 System Structure Overview
In order to understand how XP works with threads, priorities, and interrupts,
it is necessary to gain some basic knowledge about the structure of the OS.
4.2.1 Hardware Abstraction Layer
One of the primary design goals of NT was to make it portable across dierent
platforms[31]. Therefore, NT/XP is divided into several layers, each one
using the services of the ones below it. As shown in Figure 1, the rst layer,
working closely with the hardware, is called the Hardware Abstraction Layer
(HAL). Its purpose is to provide the upper level of the OS with a simplied
abstraction of the often very complex hardware below it, in order to allow
the rest of the OS to be mostly platform independent. For example, HAL
has calls to associate interrupt service procedures with interrupts, and set
their priorities[31]. The HAL is delivered in source code (requiring a special
agreement with Microsoft). It is thus possible to redene how XP handles
the system clock, interrupts, and so forth[35]. As will be shown in Section 6,
some third party solutions make use of a modied HAL to achieve predictable
temporal behavior in XP.
4.2.2 Kernel
Above the HAL is the actual kernel layer. The purpose of the kernel is to
make the rest of the OS hardware independent. This is where XP handles
thread management and scheduling, context switches, CPU registers, page
tables, and so on. The actual scheduling algorithm used will be discussed
later in this section. The kernel has another important function: it provides support for two classes of system objects, namely control objects and
18
dispatcher objects.
Control objects are objects controlling the system. The most important
object to know about is the deferred procedure call (DPC), which is used
to split o the non-time critical part of an interrupt service procedure from
the time critical part. This mechanism will be explained in greater detail in
Section 4.4.2.
Dispatcher objects include semaphores, mutexes, events, and other objects threads can wait on. Since this is closely related to thread scheduling,
dispatcher objects are handled in the kernel.
4.2.3 Device Drivers
Device drivers work closely with the kernel. Running in kernel-mode, they
have direct memory access and can manipulate system objects and I/O devices. However, a device driver can also do things not related to devices,
such as performing calculations. This part of the system is relevant for this
report. The Windows Driver Model (WDM) will be discussed in greater
detail in Section 4.6.
4.2.4 Executive
The last part of the system structure mentioned in this brief overview is what
is known as the executive. It is a collection of components working together
with the kernel to provide the rest of the system with a device-independent
abstraction. Among other things, the executive contains components for
managing processes, I/O, and memory. The I/O Manager, for example,
plays an important role in interrupt handling, explained in Section 4.4
4.3 Thread Scheduling and Priority Levels
Windows XP has 32 priority levels for user-mode threads, numbered 0 to 31.
A process can have one of the following class priorities for the process: Idle,
Below Normal, Normal, Above Normal, High, and Realtime. Each thread
can then have a relative priority compared to the other threads in the process.
The available thread priority levels are: Idle, Lowest, Below Normal, Normal,
Above Normal, Highest, and Time Critical[31]. This sums up to a total
of 42 combinations, which are mapped to the 32 priority levels according to
Table 1.
As seen in Table 1, the class priorities ranging from High to Idle have
the same upper and lower priority limit. This makes it possible for the XP
scheduler to dynamically make priority adjustments to maximize average
performance[24]. For example, when an I/O operation completes a request
that a thread was blocked waiting for, the priority of that thread is increased.
The purpose of this is to maximize I/O utilization[31] and is not the same
19
Win32 process class priorities
Win32
Thread
priorities
Time critical
Highest
Above normal
Normal
Below normal
Lowest
Idle
Realtime
31
26
25
24
23
22
16
High
15
15
14
13
12
11
1
Above
Normal
15
12
11
10
9
8
1
Normal
15
10
9
8
7
6
1
Below
Normal
15
8
7
6
5
4
1
Idle
15
6
5
4
3
2
1
Table 1: The priority levels in XP.
as priority inheritance. Note that the dynamic priority boosts never increase
the priority above level 15.
As a result of these dynamic priority properties, none of the aected
priority classes are predictable and the use of them in a real-time application
would render the application non-deterministic. The number of available
priority levels to be considered for a real-time application is thus reduced
from 32 to just 7 (the Realtime class).
The thread priority levels in the Realtime class are all higher than the
dynamic classes, making them more suitable for real-time application usage.
It should be clear that, although this priority class is called Realtime, there
are no guarantees given from the operating system. It simply means that it
is the highest priority class available for user-level threads and no dynamic
priority adjustments are ever made on threads in this class[31].
Threads sharing the same priority level are processed in First-In-FirstOut (FIFO) order.
Figure 2 shows the full range of priority levels in XP, including the ISRs
and DPCs.
4.4 Interrupt Handling
Interrupts in XP have higher priority than all the user-level threads mentioned in Section 4.3, including those in the real-time priority class.
All hardware platforms supported by XP implement an interrupt controller that manages external interrupt requests (IRQs) for the CPU. Once
an interrupt occurs, the CPU gets the interrupt number (known as a vector),
which is translated from the IRQ by the interrupt controller. This vector
is then used as an index in the interrupt descriptor table (IDT) to nd the
appropriate routine for handling the interrupt[30, 13]. XP lls the IDT with
pointers to routines for interrupt handling at start-up. To locate the IDT,
the CPU reads the IDT register (IDTR), which stores the base address and
size of the IDT[13]. XP also uses the IDT to map vectors to IRQs[30].
20
Figure 2: The full range of priority levels in XP[3].
Since interrupts are handled dierently by dierent CPU architectures,
XP provides an abstract scheme to deal with all platforms. This HAL scheme
provides a common priority handling mechanism for interrupt requests by
assigning an interrupt request level (IRQL) to all interrupts[2]. IRQLs range
from 0 to 31, where higher numbers represent higher priority. The dynamic
and real-time priority spectrum for user threads all run at IRQL 0 and have
an internal priority scheme as described earlier in this report.
Because the CPU is always executing code at a specic IRQL stored as
part of the execution context of the executing thread, the IRQLs is used to
determine execution order.
When an interrupt occurs, the CPU compares the IRQL of the incoming
interrupt to the current IRQL. If the incoming interrupt has a higher IRQL
than the current one, the trap handler saves the state information of the
currently executing thread, raises the IRQL of the CPU to the value of the
incoming interrupt, and calls the interrupt dispatcher, which is a part of the
I/O Manager. The interrupt dispatcher calls the appropriate routine for handling the interrupt. When the interrupt routine is nished, the CPU lowers
the IRQL to the value of the preempted thread and continues execution.
If the IRQL of the interrupt is lower than or equal to the current IRQL
of the CPU, the interrupt request is left pending until the IRQL drops below
the value of the request[30, 2].
Two classes of IRQLs exist. The lowest three IRQLs (0-2) belong to the
software class. They consist of PASSIVE_LEVEL, used for normal thread execution, DISPATCH_LEVEL for thread scheduling, memory management and ex21
ecution of DPCs, and APC_LEVEL for asynchronous procedure call execution[2].
Asynchronous procedure calls and deferred procedure calls are explained
later in this section.
The remaining levels (3-31) belong to the hardware class. The lowest 24
IRQLs in this class (3-26) are reserved for device interrupts, also known as
DIRQLs. They are used for interrupt service routine (ISR) execution[2, 30].
4.4.1 Interrupt Service Routine
The interrupt dispatcher, among other things, makes the system execute an
ISR mapped to the device triggering the interrupt, which runs at the same
IRQL as the interrupt[31]. For a more detailed explanation of the interrupt
dispatcher, see Section 4.6.4 on page 26.
Only critical processing is performed in the ISR, for example, copying
or moving a registry value or buer. An ISR must complete its execution
very quickly to avoid slowing down the operation of the device triggering the
interrupt, and delaying the operation of all lower processes at lower IRQL.
4.4.2 Deferred Procedure Call
Although an ISR might move data from a CPU register or a hardware port
into a memory buer, in general the bulk of the processing is scheduled for
later execusion in a DPC, which runs when the processor drops its IRQL
to DISPATCH_LEVEL[31]. The DPCs are handled by the scheduler in a FIFO
queue. Since interrupts have higher IRQLs, a DPC can be preempted by
an interrupt at any time, which means the FIFO queue can sometimes grow
very long.
However, it is possible to set a higher priority of a scheduled DPC using
a special kernel method. This will eectively place the DPC rst in line of
the queue[25].
4.4.3 Asynchronous Procedure Call
There are also asynchronous procedure calls (APCs) running below the DPC
priority level. APCs are similar to DPCs, but they must execute their code
in the context of a specic user process[3], which means a full process context
switch may need to be carried out by the OS before it can run.
ISRs and DPCs, on the other hand, only manipulate the kernel memory
shared by all processes and can therefore run within any process context.
4.5 Memory Management
The concept of virtual memory is used in XP. One of the main reasons for this
is to allow the system to use more memory than is physically available. For
example, an application requiring 500 MB of memory can run on a computer
22
with only 256 MB of random access memory (RAM) available. This can be
achieved by moving blocks of memory out to the hard disk drive (HDD)
when not directly needed by an application, to make room for the ones that
are actually needed[31, 9]. These blocks of memory, or pages, are said to
be mapped out from memory when not needed. Likewise, when pages are
needed by an application and not currently in memory, they are mapped in
again. Pages not loaded in memory are stored in paging les. This allows
the system to use as much memory as the RAM and paging les combined.
All processes running in XP use pages to access memory. A xed page
size is used for a specic system architecture. On the Pentium architecture,
the page size is 4 KB. An address in the virtual address space is 32 bits
long, which results in a total availability of 4 GB virtual memory for each
process[31, 9, 30].
The virtual memory for each process is split up in two halves. The
lower 2 GB half is used for process code and data, except for about 250 MB,
which is reserved for system data. This system data is shared by all user
processes and contains system counters and timers.
The upper 2 GB half of the virtual address space is the kernel memory,
containing the operating system itself, page tables, the paged pool, and the
nonpaged pool. Except for the page tables, the upper memory is shared by
all user processes in the system. However, it is only accessible from kernelmode, which means the user processes are not allowed to directly access this
memory[31, 9].
Process 1
Process 2
Paged Pool
Paged Pool
2 GB
Nonpaged Pool
Physical Memory
Nonpaged Pool
OS
OS
Page Table
Page Table
250 MB
System Data
System Data
1750 MB
Process
Private
Code and
Data
Process
Private
Code and
Data
Figure 3: The virtual memory for two processes. The gray areas represent
shared memory.
The page tables store information about the available pages for each
process in the system. Every process has its own private page table.
23
In order for a user-process to access the kernel memory, system calls
(including driver requests) need to be made. When a system call is executed,
the system traps into kernel-mode, which makes the entire kernel memory
visible to the process. The virtual address space remains unchanged, which
makes the processing of system calls performant[31].
4.5.1 Kernel Page Pools
The nonpaged and paged memory pools are used by drivers and the OS for
data structures. Drivers are loaded in the nonpaged pool and can allocate
memory from both the nonpaged and the paged memory pools[31, 30].
Although both the paged and the nonpaged memory pool are accessible
for all processes, one major dierence exists. While the paged pool is handled
just like the private memory of each process, the nonpaged pool is never
mapped out from memory, which means no page faults can occur when
accessing pages allocated in the nonpaged pool.
One of the reasons for having a nonpaged pool is to guarantee that some
parts of the system are never paged out. For example, if the memory manager
itself, running in DISPATCH_LEVEL, was mapped out, no other pages could
be mapped in, leading to system failure[30]. For this reason, memory in
the paged pool can only be accessed from the PASSIVE_LEVEL IRQL. Higher
IRQLs must use the nonpaged pool[9, 30].
4.5.2 Memory Manager
The memory manager is responsible for moving pages in and out of memory.
When a process is accessing a page that is not mapped in, a page fault will
occur. This page fault is handled by the memory manager, which loads the
page to memory. The process causing the page fault is interrupted and has to
wait for the memory manager to load the page into memory before execution
can continue.
For performance reasons, the page replacement algorithm in XP strives
to always have a certain amount of free physical memory pages available.
This will decrease the amount of work when a page needs to be mapped in,
since only one disk operation is needed to read a page to be mapped in, as
opposed to both mapping out a page to disk and mapping in another. To
make sure enough free pages exist, the system runs the balance set manager
every second. If the number of free pages decrease to a specic threshold, the
memory manager starts mapping out pages not needed at the moment[31,
30].
4.6 Windows Driver Model
The WDM is a framework for device drivers that is source code compatible
with Windows 98 and later. It includes a library, oering a large set of
24
routines to the developer[25, 36]. There are two major classes of WDM
drivers. The rst class is called user-mode drivers. Drivers in this class
run in user-mode and the class is mostly intended for testing purposes with
simulated hardware.
The other class, which this report will focus on, is the kernel-mode driver.
As the name implies, drivers in this class run in kernel-mode. Because of
the direct hardware access available in kernel-mode, this type of driver is
used to control hardware. Even though kernel-mode drivers are often used
to control hardware, simulated hardware or no hardware at all can be used
by these drivers[2, 25].
4.6.1 I/O Request Packets
Drivers written for the WDM framework should handle input/output (I/O)
requests as specied in the I/O Request Packet (IRP). I/O requests are
I/O system service calls from user-mode applications, such as read and write
operations[2]. An IRP determines the work order, i.e. in what order dierent
subroutines of a driver should be executed to complete an I/O request. When
the IRP is created, it is passed to the I/O Manager, which determines what
driver and subroutine should execute. The subroutine performs its work on
the IRP and passes it back to the I/O Manager, which sends it to the next
subroutine. When the IRP is completed, the I/O Manager destroys it and
sends the status back to the requestor[25, 2].
4.6.2 Driver Types
Three types of drivers exist under WDM.
Function drivers are responsible for I/O operations, handling interrupts
within the driver, and deciding what should be controllable by the user.
Bus drivers handle the connection between the hardware and the rest of
the computer. The PCI bus driver, for example, detects the cards on the
PCI bus. It determines the I/O-mapping or memory-mapping requirements
of each card. Both function and bus drivers are required for all hardware
devices.
The third type of driver, the lter driver, can be supplied by manufacturers to modify the functionality of the higher functional driver[25]. This is
known as an upper lter driver. There are also lower lter drivers that work
as a lter between the bus driver and the function driver. A good example
of a lower lter driver is one that encrypts data before it reaches the bus
driver, which means neither the functional driver nor the bus driver need to
know about the encryption.
25
4.6.3 Device Objects
To help software manage hardware in Windows, device objects are used.
Each type of driver has a device object mapped to it. Bus drivers are represented by physical device objects (PDO). Functional device objects (FDO)
are mapped to the function drivers. Both above and below the FDO, lter
device objects (FiDO) may exist, which are mapped to lter drivers[2, 25].
4.6.4 I/O Request Processing
When an I/O request is raised in the system, it gets processed according to
the steps in Figure 4. Although not all I/O requests go through all these
steps, this model represents a typical I/O request ow.
1
User Thread
2
IRP = PENDING;
StartIO();
7
12
io_request();
Dispatch routine
6
Win32 Kernel
User-Mode Application
3
HAL
I/O Manager
4
5
8
ISR
StartIO
EnableInterrupts();
RequestDPC();
9
11
10
DPC
IRP = SUCCESS;
CompleteIRP();
Kernel-Mode Driver
Figure 4: The ow of I/O requests through the system.
1. When an I/O request is invoked by a user-thread, the system traps
into kernel-mode and passes the request to the I/O Manager.
2. In the I/O Manager the request is translated into an IRP, describing
the work order of the drivers involved in handling the request. Before
invoking the right dispatch routine of the driver (one dispatch routine
26
per function oered by the driver exists), the I/O manger prepares the
user buer and the access method to this buer[2, 25].
3. If no device activity requiring interrupts is needed for the I/O request
(for example, when reading zero bytes or writing to a port register), the
dispatch routine marks the IRP as completed, executes the rest of the
dispatch routine, and sends it back to the I/O Manager, which noties
the user-thread of the completion of the I/O request. The scenario
of reading zero bytes can occur if polling (periodical status checking)
drivers are used[2, 25].
Usually, however, the I/O request actually needs some device activity
before completion. In this case the IRP is marked as pending and the
Start I/O function of the driver is called before the IRP is passed back
to the I/O Manager. The dispatch routine also performs parameter
validation. For functional drivers, the parameter validation has to
take the limitations of the underlying bus driver into account. For
example, if the total transfer size exceeds the limits of the bus driver,
the dispatch routine is responsible for splitting the request into multiple
requests[2, 25].
4. The I/O Manager then queues the call to the Start I/O routine of
the driver, which starts up the device. The rst thing done by the
I/O Manager when a device is requested to start is to check to see if
the device is busy. That is, checking if a previous IRP is marked as
pending for the device. If the device is busy, the new IRP is queued. If
the device have no IRP marked as pending, the queue is skipped and
the Start I/O routine of the device is called directly, which starts the
device by safely accessing the device registers[2, 25].
5. The IRP is then returned to the I/O Manager, which awaits a device
interrupt[2, 25].
6. HAL receives the device interrupt when it occurs.
7. The interrupt is then routed to the interrupt dispatcher of the I/O
Manager.
8. Most devices are connected to an interrupt request level (IRQL), which
means the interrupt dispatcher calls the ISR of a device connected to a
specic IRQL when a device interrupt occurs. Some devices do not use
interrupts and requires polling to notice any changes for that device[2].
Since IRQLs can be shared by other drivers, the rst thing the ISR does
is checking whether or not the interrupt was intended for the specic
device. If not, the interrupt request is passed back to the interrupt
dispatcher, which sends it to another device connected to the same
IRQL[2, 25].
27
The ISR is working on the IRQL of the device, which means that
other threads at the same IRQL or lower have to wait until the ISR
is completed. Because of this, as little work as is reasonably possible
should be performed in the ISR. Most of the time, ISRs only perform
hardware dependent work, such as moving data to or from hardware
registers to kernel-mode buers. As mentioned earlier, the number of
kernel-mode functions available in an ISR is very limited[2, 25].
9. Because of the limited kernel-mode functionality available, the ISR
often schedules a DPC for latex execution, which will take care of the
processing not performed in the ISR[25].
10. The scheduling of DPCs are handled by the I/O Manager and is implemented as a FIFO queue. Although the DPC queue is of FIFO type,
drivers can set the priority of the DPC as high, which will make the
I/O Manager place the DPC rst in the queue.
11. The DPCs run in DISPATCH_LEVEL and have full access to the kernelmode functions. The DPCs complete the work of the device driver that
for various reasons could or should not be performed in an ISR. After
the work in the DPC is done, the DPC marks the IRP as completed
and sends it back to the I/O Manager, which in turn destroys it[25, 2].
12. When the I/O Manager has destroyed the IRP, it schedules a kernelmode APC. This APC will execute I/O Manager code for copy status
and transfer size information to the user-thread. The APC needs to
execute in the context of the requesting user-thread since it needs to
safely access the user-space memory. By running the APC at the same
priority level as the requesting thread, page faults can be handled normally.
If the I/O request included a data read from a device with the buered
I/O read method, the APC copies the driver allocated buers back
to the user-space buers of the requesting thread (from the nonpaged
pool to the paged pool accessible by the user-thread). When the APC
has completed its execution, the I/O Manager noties the requesting
user-thread[2, 25].
4.6.5 Floating-Point Operations
According to the WDM documentation, drivers should avoid doing any
oating-point operations unless absolutely necessary, for performance reasons [25, 36]. Before carrying out oating-point operations, a special kernel
routine needs to be called to save the nonvolatile oating-point context. After the oating-point operations are nished, another kernel routine must be
called to restore the nonvolatile oating-point context again[25]. Callers of
28
these kernel routines must be running at IRQL ≤ DISPATCH_LEVEL. In other
words, oating-point operations are not allowed in ISRs[25].
29
5 Real-Time Aspects of XP
While the previous section provided an overview of XP, this section analyses
the real-time aspects of the OS and compares the system characteristics with
the previously mentioned RTOS requirements. Finally, the reasons why XP
is not suitable for hard real-time applications are explained.
XP is a GPOS for PCs[34] and as such, the priority for the OS is to optimize average performance, not minimize or limit worst-case performance.
For a real-time application, the WCET is more relevant, since it is a guarantee that the execution time will never exceed a certain limit[33]. The average
performance, on the other hand, is irrelevant in the RTOS context, since it
gives no guarantee regarding execution time for a particular execution.
5.1 Design Issues That Limit XP's Use As a RTOS
There are several design issues in XP limiting its use as a RTOS[24]:
• No priority inversion protection exists
Threads running in the Realtime class can be blocked by lower priority
threads holding a shared resource. No mechanism to prevent this exists
in XP.
• Limited number of priorities
As explained in Section 4.3, only 7 priority levels are available for
Realtime threads. This is only sucient for very simple real-time applications and severely limits the amount of control a system designer
has over thread priorities.
• DPCs are processed in FIFO order
Even though dierent interrupt priority levels exist, the bulk of the
processing in a device driver is done in a DPC, which is processed in
FIFO order. This makes time critical processing unsuitable even at
this priority level, since it may be delayed indenitely by less critical
processing scheduled earlier in the FIFO queue. DPCs can also be
delayed by ISRs of any priority level.
It is possible to specify a higher priority when scheduling a DPC. This
will place the DPC rst in the DPC queue. However, there is no
guarantee that other device drivers will not do the same, which would
only invert the DPC processing order.
• Masking interrupts
Any code running in kernel-mode, including all device drivers, can disable interrupts or raise the IRQL to the highest level, which eectively
30
gives the code exclusive access to the CPU. This can lead to unpredictable results.
This could potentially be used by a small real-time application that
wants to increase the temporal determinism, but there is no guarantee
that other non-critical device drivers in the system would not take
advantage of this too.
• Page swapping
XP's use of virtual memory leads to page swapping, which can occur at
any point during the execution of a thread. However, virtual memory
can be turned o in XP, eectively eliminating this design issue.
• IRQL mapping
The HAL dynamically maps interrupts to IRQLs at system startup as
it detects the devices attached. This leads to reduced portability and
predictability of a real-time application, since it is not possible to know
the order of device interrupts when hardware changes.
By reducing the number of device drivers used in the system and making sure that as few drivers as possible share the same IRQL, a higher
level of predictability can be achieved.
• Interrupts and DPCs have higher priority than Realtime threads
Even threads running at the highest user-level priority can be delayed
indenitely because of interrupting ISRs and DPCs.
5.2 Using XP as a RTOS
Dierent approaches of using XP as a RTOS are suggested throughout the
literature, where the most common alternatives are [32, 27, 17]:
• Use XP as it is, but with a constrained environment for applications
and functionality to ensure timing constraints. Future development of
such a system is hard, and no guarantees of deadlines can be given.
• Implement the time critical parts as a device driver running in kernelmode. The richness of the entire Win32 application program interface
(API) cannot be utilized. Debugging becomes more dicult and critical, since bugs can crash the whole system.[3, 5].
• Create a wrapper for the Win32 API to a commercial RTOS. No COTS
can be used and the Windows device drivers cannot be used.
• Run Windows XP and a RTOS on two dierent machines. Both hardware and software costs increase.
31
• Run Windows XP and a RTOS on a single machine.
This report will focus on the rst two approaches.
32
6 Extensions
In this section, the approach of running Windows XP and a RTOS on a single processor machine will be examined, using real-time extensions available
from third parties. All the extensions have slightly dierent implementations, but all of them have made some modications to the HAL or at least
intercepts the interrupts before they reach the HAL (which actually can be
seen as a modication)[32]. Note that not all of these extensions make permanent changes to the HAL. Instead, a reconguration of the HAL is done
at system startup. The extensions include:
• CeWin and VxWin by Kuka Controls[18].
• HyperKernel by Nematron[12].
• RTX by Ardence[29].
• INtime by TenAsys[15].
Since information on CeWin, VxWin, and HyperKernel is sparse, this
report will not focus on these two solutions. RTX and INtime, however, will
be given deeper descriptions, since more research has been done on those
technologies.
6.1 RTX
6.1.1 Architecture
The RTX runtime environment is implemented as something called a RealTime Sub-System (RTSS). This is actually a kernel device driver for Windows
XP. Achieving real-time performance this way is possible thanks to the standard device driver model and the fact that the HAL is customizable. By
combining these two techniques, a temporal predictable model for building
real-time systems is possible[10].
The RTSS is implemented as a system capable of stopping Windows from
masking interrupts, using an own scheduler, and handling synchronization,
to name a few features[10]. Since the RTSS runs as a kernel device driver,
applications written in RTX will also run in kernel-mode. This mode oers
no memory or stack overrun protections, errors that would likely give an
unreliable execution environment resulting in a system crash.
The HAL modications used in RTX have been implemented as extensions instead of an entire replacement. This makes the RTSS compatible
with all existing versions of Windows XP, Windows 2000, and Windows
2003 platforms. New Windows service packs can be installed without affecting the RTSS environment. The RTSS relies on the HAL extensions to
operate correctly[10]. The extended HAL used in RTSS is called RT-HAL
throughout this section.
33
The standard Windows HAL was modied for the following three reasons[10]:
1. To make it impossible for Windows XP threads to interrupt the RTSS
or mask the RTSS-managed devices. RT-HAL intercepts interrupt
masks coming from the Windows threads and manipulates this mask,
so that no RTSS-controlled interrupts can be masked.
2. To increase the resolution of the Windows XP provided timers to 100
µs, instead of 1000 µs.
3. To provide a shutdown handler for the Windows XP environment,
which makes it possible for the RTSS to carry on after a traditional
bluescreen Windows crash. The RTSS applications are responsible for
managing the shutdown handler and it is up to the real-time application developer to decide which applications should use this handler.
The handler is used to clean up and reset any hardware state if a crash
or normal shutdown of the XP environment occurs. However, it is the
developer's responsibility what will actually happen.
RTX supports 256 thread priority levels. The scheduling algorithm used
is round-robin[1] and the ready queue is implemented as a double linked
list for each priority level. This increases both the speed of insertion and
removal of threads compared to a single linked list. If two threads of the
same priority are ready at the same time, one of them is chosen and runs
until the quantum has expired. By default, the quantum is set to innity[10].
The RTSS uses the Windows provided model even for RTSS interrupt
handling. This may seem unwise, since previous work has shown that DPCs
are not deterministic enough for real-time use[3]. However, it only catches
the interrupt in Windows and then the actual ISR is run in the RTSS, if the
interrupt was intended for it. The RTSS is therefore only dependent on the
interrupt latency of XP[10]. Studies have shown that interrupt latencies in
XP is very deterministic, enough to even run hard real-time systems in an
ISR[3]. RTX has worked on lowering the interrupt latency, and claims have
been made that WCET of less than 30 µs is possible[10].
The RTSS environment uses the memory management mechanisms provided by XP, and memory allocation is done in the nonpaged memory pool[10].
This means that memory allocation by a RTSS-thread is non-deterministic.
The benet of this memory model, according to Ardence, is that it reduces
RTX resource consumption.
Communication between the XP environment and the RTSS environment
is realized with the use of queues, one in each direction. If an XP thread
needs some service from the RTSS environment, a command is inserted into
the queue as a Service Request Interrupt (SRI). The RTSS environment then
executes the service and sends a reply message back to the XP thread. Normally, SRIs for synchronization are requested by the XP environment and
34
SRIs for memory management and le operations are requested by the RTSS
environment[10]. Priority inversions for shared resources are avoided by using priority inheritance, also known as priority promotion in most papers
studying RTX[10, 1].
6.1.2 Software Development
RTX provides libraries which can be used by Visual Studio. It also provides
a useful application wizard, guiding the user through settings, and generates
skeleton source code for the applications[10, 1]. Even though applications
written for RTX run in kernel-mode, code writing and debugging can be
done in user-mode during development from within Visual Studio (version
6.0 and newer), oering a fully protected environment. Breakpoints can be
set and source code stepping can be used, just like when debugging any
normal Windows application. Final releases, however, will be compiled to
run in kernel-mode[10, 1].
6.1.3 Does RTX Meet the RTOS Requirements?
• The OS has to be multitasked and preemptible.
The OS is denitely preemptible and multitasked. Preemption can occur for both threads and ISRs. Tasks can be realized as both processes
and threads in this system, allowing for lower task switching time if
memory can be shared by other tasks. The scheduling algorithm used
is round-robin with priority queues.
• The notion of task priority has to exist.
256 priority levels for threads exist.
• The OS has to support predictable task synchronization mechanisms.
Synchronization objects are available, such as semaphores, mutex, and
shared memory objects. A study by Timmerman et al. showed predicable behavior of synchronization objects[32].
• The OS must support a system for avoiding priority inversion.
Priority inheritance is used to protect the system from priority inversion.
• The OS must have predictable temporal behavior.
To determine this requirement, extensive testing of this extension needs
to be made. Memory allocation is not deterministic since it is handled
by Windows memory management mechanism[10].
Four of the ve requirements of a RTOS are denitely fullled by RTX.
Even though the undeterminism of the memory allocation can be solved by
allocating all memory needed before startup of the real-time system, the
35
tests done in [32] are too limited to conclude that RTX oers predictable
temporal behavior under all conditions.
6.2 INtime
6.2.1 Architecture
In contrast to RTX, INtime from TenAsys runs both the real-time applications and the non real-time applications in user-mode. There is still the
possibility to write a real-time application as a driver, which will run in
kernel-mode[16, 28]. Running applications in user-mode protects the system
from crashing because of programming errors such as null pointers, and page
faults. However, applications still have the ability to gain direct access to
physical memory if that is deemed necessary by the developer[28].
INtime installs a number of components in Windows. The most important includes a Windows kernel driver and a Windows service. The kernel
driver manages communication between the INtime and the Windows environment. The service handles the actual loading of the INtime kernel into
the system. A context switch then occurs to make the system go into the
INtime kernel. In this state all real-time activity is handled before any Windows activity. XP eectively becomes the idle-task of the INtime kernel.
When running in the INtime kernel, all Windows interrupts are masked. A
real-time interrupt (both software and hardware) is handled directly. Thanks
to monitoring of the HAL, Windows kernel is unable to mask real-time interrupts. This means that even badly designed device drivers, masking interrupts running in the Windows kernel, cannot aect the performance of
the real-time kernel[28].
The scheduling algorithm used in INtime is round-robin with 256 priority
levels. 128 of these levels are priority for user threads and the other 128 are
used for interrupt priorities[16, 24].
The interrupt handling in INtime is similar to the one used in XP. When
an interrupt occurs, it is handled by its appropriate ISR. Just as in XP, minimal work is done in the ISR[24]. The bulk of the work is instead performed
in an interrupt thread. Interrupt threads are like DCPs in XP, but with
dierent priority levels instead of a single FIFO queue, to increase temporal
determinism of the system. This interrupt model can also be bypassed and
processes can handle interrupts directly[28].
Memory management is handled by INtime itself and all shared memory
(used for shared resources) reside in the nonpaged memory pool. This means
no swapping of shared memory can occur, making the temporal predictability
in accessing shared memory good[24].
Shared resources used within INtime are protected with semaphores.
Semaphore queues can be realized as both FIFO and priority queues. Compared to XP, which only uses FIFO queues for semaphores, the temporal
36
behavior is more deterministic using priority queues. Priority inheritance
is used on shared resources to ensure that priority inversion is as low as
possible[24, 28].
Thanks to the monitoring functionality, which makes XP the idle thread
of INtime, real-time applications can continue to run even if XP crashes. The
rst thing done in case of an XP crash is to suspend the thread scheduling
XP. A real-time process can then restart the Windows operating system and
operation can be brought back to normal mode[28]. This means it is possible
to make the real-time applications completely independent of XP.
6.2.2 APIs
INtime provides the user with multiple programming APIs:
• Real-Time API
The real-time API resembles the Win32 API, which will make a transition for Windows programmers as smooth as possible[24]. The realtime API is object based, where all objects are referenced by handles.
Handles are global to the entire real-time system[28].
• Win32 API
A subset of the Win32 implementation used in Windows CE is provided by INtime. It will allow usage of some existing code directly
in INtime[28]. Although it is based on the Win32 API for CE, no
information is given whether it has time deterministic behavior or not.
• APIs for the Windows environment
Windows APIs are provided to allow the non-real-time Windows environment to share objects with the real-time INtime environment. Both
real-time objects and Win32 objects can be shared by processes in the
dierent environments[28].
• C and C++ libraries.
INtime provides support for both Embedded C++ (EC++), with the
use of the Standard Template Library (STL), and ANSI C.
6.2.3 Software Development
Software is written in C or C++, with the entire STL available. Building
applications from start to release can be done entirely in Microsoft Visual
Studio. INtime even includes a project wizard for the IDE. This wizard eases
development and generates skeleton code for the developer. Even debugging
can be done in Visual Studio .NET with the use of breakpoints, source-level
single-stepping, and variable watching. For Visual Studio 6 users, INtime
includes a separate debugger called Spider.
37
6.2.4 Does INtime Meet the RTOS Requirements?
Does XP using the INtime extension fulll the requirements put on a RTOS?
• The OS has to be multitasked and preemptible.
This requirement is fullled since INtime is multithreaded, and thereby
multitasked. Preemption can occur at every level in the system. ISRs
have dierent priority levels and are preemptible as well. Interrupt
threads exist with 128 dierent priority levels. These attributes clearly
fulll the rst requirement.
• The notion of task priority has to exist.
This requirement is also fullled, since both user/kernel level threads
and interrupt threads have priorities.
• The OS has to support predictable task synchronization mechanisms.
The INtime kernel uses semaphores with both FIFO and priority queues,
where priority queues should give higher predictability. Acquisition
and release of semaphores has been shown to be deterministic in [32].
• The OS must support a system for avoiding priority inversion.
INtime uses priority inheritance to achieve this goal.
• The OS must have predictable temporal behavior.
In a previous study, INTime showed predictable behavior[32]. However, this study was based on version 1.20 (while the current version is
3.0) and the number of tests conducted were limited.
INTime fullls at least four of the ve RTOS requirements. However,
tests and time measurements of the software are needed to determine if it
oers a predictable temporal behavior under all conditions.
38
7 Related Work
Studies of real-time performance in Windows NT (using the same scheduling
and interrupt routines as XP) have been done before. While most of the
studies focused on real-time performance of user level threads[24, 27, 4],
some studies focused on the real-time performance in device drivers[5, 3].
Most of the papers based their conclusions on time measurements, while [34]
drew its conclusion based on the inner workings of Windows NT.
7.1 User Level Thread Implementation
The testing of user level thread performance was done in a slightly dierent
way. While [27, 24] implemented a time critical application, [4] only measured thread creation time and task switching under various system loads.
All these tests show that temporal predictability decreases as the system
load increase and the frequency of interrupts increases. Higher prioritized
tasks also seem to increase the temporal predictability. The conclusions of
the predictability of user level threads are clear: Windows NT is not suitable
for running real-time applications at user level; WCET for the application
in [24] was almost 10000% over the average execution time. Depending on
the timing constraints, Windows NT running user level applications could
be used for soft real-time systems. According to the studies, WCETs cannot
be guaranteed, which means the system must be allowed to miss deadlines
sometimes.
7.2 Driver Based Implementation
The driver based experiments dier more from each other than the user level
testing. [3] measures interrupt latencies for dierent drivers, and interrupt
vectors under various load conditions (always known before testing). The
paper continues with execution time measurements for ISRs and DPCs under
the same system conditions. Even measurements of context switching latency
for both processes and threads were made, but these tests did not use as
varying system load as the test conducted to measure interrupt latency. The
results from the interrupt latency tests showed that the latency did not dier
much because of system load, except when network load was present. Since
the Ethernet interface was set up to raise interrupts at vector 10, the custom
serial driver (connected to vector 11) would be preempted by every interrupt
on the Ethernet interface.
By assigning the driver to another vector, the high latency introduced by
network load could be reduced. As shown by these test results, network load
had no major eect on the interrupt latency when lower interrupt vectors
than 10 were used. ISR execution time of the custom serial driver had a
high temporal predictability under all system loads tested. However, the
39
ISR execution time for dierent drivers did not have predictable temporal
determinism, but that is dependent on the amount of work needed in the
ISR (or perhaps badly designed drivers).
The result from the DPC latency measurement had enormous standard
deviations from average times. Neither thread nor process context switching
gives any determinism to the system. The latency depended highly on CPU
load.
The conclusion of this study is that Windows NT is suitable not only
for soft real-time systems but also for hard systems as long as all the time
critical execution is done inside the ISRs. Running in DPC-level or below
oers too poor determinism to be used by a hard real-time system. The nal
recommendation was to turn o virtual memory in the system, especially if
an integrated driver electronics HDD is used, since the experiment on paging
showed that IDE drivers basically had no temporal determinism at all.
The second paper, focusing on the driver implementation[5], had a different approach. It implemented a driver polling input data at the frequency
programmed to the LAPIC driver (see [5] for more information) located on
the CPU, running with the frequency of the system bus. This timer was
programmed to use the highest interrupt level. However, the LAPIC timer
can still be delayed if interrupts have been disabled by other drivers. The
methodology descriptions in these tests were sparse, making it hard to fully
understand how the tests were carried out. According to the paper, the
LAPIC driver performed well on both loaded and unloaded systems. However, since some data polling occations still missed the deadline, no hard
real-time systems could be implemented successfully according to the paper.
The authors stated that the LAPIC driver required a dual processor machine,
since the LAPIC is disabled by Windows XP on a uniprocessor system. This
contradicts the fact that they presented test results from a system using a
single Pentium 4 processor with Hyper-Threading, which is a technology to
simulate a dual processor machine (see [11]). Although this suggests that
Hyper-Threading is enough, it is still a drawback with this solution.
7.3 Conclusions
In general, it seems that all papers agreed Windows NT can be used for soft
real-time systems if:
• the timing constraints are not too tight,
• the system is allowed to miss deadline sometimes, and
• the work load is low.
Some of the papers also concluded that if all jobs are done at ISR level,
even hard real-time systems can be built on Windows NT[3]. In contrast,
40
[34] concludes that running a hard real-time system Windows NT is out of
the question. The methodology between these two papers were very dierent
since [3] measured execution time of an implementation while [34] based their
conclusions on analysis of the inner workings of Windows NT.
41
8 Problem Description
The true trigger of this master's thesis is an anonymous business unit of ABB
providing embedded system based products for the automation industry.
They use a traditional real-time system, where input data received from a
sensor is processed in an algorithm and then sent to an actuator. Outside
the real-time core, extra functionality is provided to make the units more
useful to the customer.
Today, the real-time core is running on a dedicated hardware, isolated
from the extra functionality. The trend clearly shows that the demand for extra functionality outside the core is growing. The development cost required
to meet this demand is usually very high, for several reasons:
• The systems often run on specic hardware with memory constraints
and limited resources.
• The systems are sometimes running a custom designed operating system, which means no commercial of the shelf (COTS) software components exist.
• Even on systems running a commercial RTOS, writing software and
reusing software components is more limited than in popular general
purpose operating systems (GPOS).
• The RTOS development environments are often complex, which makes
writing software hard.
To stay competitive in this industry, the development cost for this kind
of supportive functionality needs to be reduced. One possible solution is to
run the real-time core on the same hardware as the extra functionality and
to switch OS to a popular GPOS.
The ABB business unit wants the alternative of using XPE with a cheap
real-time extension to be explored. For their purposes, a self-written device
driver implementation is optimal, since it would be cheaper than buying
licenses from third-party real-time extensions. Because of the planned shipping volume, the manufacturing cost per unit is important to keep as low as
possible. The development cost is less important, as it is viewed more as a
one-time cost.
The following is a list of some of the key reasons why the ABB business
unit wants XP to be investigated:
• By using XP, development time would be reduced by using the same
platform for both development and target units.
• Rapid prototyping using the .NET framework would be possible without access to the target unit.
42
• A large number of COTS and standard applications would be able to
run on the target units.
The fact that XP is designed as a GPOS, and as such does not support
hard real-time usage, is recognized by the business unit. However, the characteristics of the OS need to be thoroughly explored in order to get a deeper
understanding of its performance and limitations. Even if the results would
conclude that XP is not suitable for their purposes, they would at least have
a clear reason why this is the case.
8.1 Suggested Model
Figure 5 shows the original suggested model for the embedded system, provided by the business unit. The sensor is on the left, the embedded system
running XPE is in the middle, and the actuator is on the right.
Figure 5: Sketch of the implementation suggested by the ABB business unit.
It is just meant as an overview of a possible system and is in no way nalized. For example, the communication stack suggested may be replaced with
another Ethernet based automation protocol in the real implementation.
43
8.2 Purpose
The purpose of this report is to reveal the characteristics of XPE as a RTOS
by investigating how XP works under the hood.
8.3 Scope
Because of the limited time available for this master's thesis, XP will be used
instead of XPE to run the tests. Too much time would otherwise be spent
on setting up and conguring an XPE installation. Since XP and XPE use
the same kernel (along with scheduling algorithms, IRQLs, and HAL), this
will not aect the results of the tests[37].
While Figure 5 is a model of the whole embedded system, the only part
that will be investigated in this report is the real-time characteristics of XP.
Communication from the sensor and to the actuator will be simulated.
The third-party real-time extensions will not actually be tested. The
focus will be solely on the real-time characteristics of XP itself.
There are other OSs relevant to the assigner. For example, Windows CE
5.0 is a scalable OS with real time capabilities that allows applications to
be developed in a familiar Windows environment[22]. However, CE does not
have the rich availability of COTS and applications as XP, and the .NET
Compact Framework is more limited than the standard .NET framework[6].
Windows CE will not be investigated in this report.
44
9 Methodology
In order to measure the real-time characteristics of XP, a number of tests
on a dedicated system were conducted. Because Microsoft is very protective
about the source code for XP, at best a black box approach to performance
analysis was possible.
The tests evaluated the performance aspects that aect the determinism
and responsiveness of XP as a real-time system, which included: ISR latency,
interrupt execution, DPC latency, DPC execution, and communications between user-mode and device drivers.
9.1 Conducted Tests
This report focused on the rst two approaches of using XP as a RTOS
explained in Section 5.2: Using XP as it is with a standard user-mode process; and implementing the time critical parts as a device driver running in
kernel-mode.
The latter approach was in turn devided into two separate tests: one
implementing the time critical parts in a DPC, and another implementing
it in a prioritized DPC. An ISR implementation was not considered because
of the inability to safely calculate oating-point operations, as mentioned in
Section 4.6.5.
The tests simulated a typical system used in the automation industry,
where a sensor transmits an input to an embedded real-time system, which
performs calculations on the input, and thereafter sends the results to an
actuator. Time measurements were conducted on specic events, as well as
the full event cycle, to evaluate the determinism of XP.
The sensor input was simulated using a tone generator connected to the
acknowledge (ACK) pin on the parallel port, which generated a hardware
interrupt at IRQL 3 in the CPU. A custom device driver was written to
handle the interrupts and start the event cycle.
The processing of sensor input was simulated using an algorithm, performing a xed number of oating point operations (e.g. multiplications,
divisions, etc.). Originally, this algorithm came from the ABB business unit
triggering this Master's Thesis, but because the algorithm contained many
compiler warnings and was generally very complex, the algorithm was replaced with a simpler one only using a fraction of the source code needed
for the original algorithm. Tests were conducted to make sure the execution
time of the new algorithm was equal to the original one.
Finally, the output to the actuator was simulated by setting a parallel
port pin in the device driver.
45
9.1.1 User-Thread Implementation
In the user-thread implementation, communication with the input and output were handled by read and write calls to the device driver. The algorithm
was then processed in a user-thread with the highest priority (31). Figure 6
shows the full event cycle for this implementation.
Although the user-thread starts by calling the read() function of the
driver, the actual event cycle (input from sensor to output to actuator) starts
when the hardware interrupt occurs. This is event number 6 in Figure 6.
After the algorithm is processed, the userthread calls the write() function
of the device driver, which simulates the communication with the actuator
by setting a parallel port pin.
Figure 7 shows the sequence of events, where the vertical axis represents
the priority level of the executing event.
6
Win32 Kernel
User-Mode Application
1
User Thread
7
while(1) {
read();
algorithm();
write();
}
11
HAL
I/O Manager
13
16
2
3
Read
4
5
8
10
ISR
StartIO
IRP = PENDING;
StartPackage();
9
EnableInterrupts();
DisableInterrupts();
RequestDPC();
11
14
15
DPC
Write
IRP = SUCCESS;
CompleteIRP();
outp(...);
CompleteIRP();
Kernel-Mode Driver
Figure 6: Full event cycle of the user-thread implementation.
Drv: ISR
DIRQL
Drv: ISR
Drv: DPC
Dispatch
Drv: StartIO
Usr: Algorithm
Passive
t
0
t
As
t
Af
t
Bs
t
Bf
t
Cs
Drv: Write
t
Cf
t
Ds
Drv: Read
t
Df
t
Fs
t
Ff
t
Gs
t
Gf
t
1
Figure 7: Sequential time diagram of the user-thread implementation.
46
9.1.2 Driver Implementation
In the driver implementation, the algorithm was executed directly in the
DPC of the driver. Figure 8 shows the full event cycle for the implementation.
As with the user-thread implementation, the event cycle starts when the
interrupt occurs.
Figure 9 shows the sequence of events, where the vertical axis represents
the priority level of the executing event.
Because of the fewer events and higher priority levels of the driver implementation, it was reasonable to believe it would have better performance
than the user-thread implementation. More specically, lower WCETs were
expected.
1
Win32 Kernel
HAL
2
I/O Manager
5
4
4
3
ISR
DPC
algorithm();
IRP = SUCCESS;
CompleteIRP();
DisableInterrupts();
RequestDPC();
Kernel-Mode Driver
Figure 8: Full event cycle of the driver implementation.
9.2 Test System
The tests were conducted on a dedicated PC system running Windows XP
Professional with Service Pack 2 installed. The hardware on which the
measurements were conducted consisted of an ICP Electronics NANO-7270
motherboard, a Pentium M 1.6 GHz processor, and a Fujitsu MHT2060BH
SATA hard disk drive. Attached to it were a standard USB keyboard and a
PS/2 mouse.
47
ISR
DIRQL
ISR
DPC
Dispatch
Algorithm
DPC
Passive
t
0
t
As
t
Af
t
Bs
t
t
Cs
Cf
t
Bf
t
1
Figure 9: Sequential time diagram of the driver implementation.
9.2.1 System Services
All Windows XP system services not needed for the test system, such as
Server, Workstation, and DNS, were disabled. Virtual memory was disabled
as well. Only the most critical services required for XP to run properly were
enabled, namely:
• Plug and Play
• Remote Procedure Call (RPC)
9.3 Execution Time Measurement
To determine the real-time performance of XP, the execution time of the different aspects described earlier in this section were measured. Three dierent
methods for measuring execution time were considered:
• using the performance counter (PeC) available in the Win32 API,
• using the time-stamp counter (TSC) of the processor, and
• using an oscilloscope to externaly measure signals on the parallel port.
9.3.1 Performance Counter
Measurements using the PeC were performed using the two methods provided in the Win32 API: QueryPerformanceCounter() and QueryPerformanceFrequency(). The QueryPerformanceFrequency() function returns the
number of clock ticks per second, while the QueryPerformanceCounter()
function returns the current value. Unfortunately, the PeC uses dierent
hardware timers on dierent systems. Most platforms without any processor
power saving technologies such as Speedstep use the TSC of the processor as
48
the timer, while other systems use the chipset, BIOS, or power management
timer[38].
This counter was evaluated under two dierent test conditions. The rst
condition stored two consecutive readings of the PeC (start and stop time)
6.8 million times in a for-loop. This test condition generated a theoretical
worst-case latency, since this test utilized 100% of the CPU. This makes it
more likely for kernel-mode tasks such as scheduling to interrupt the PeC
readings.
In the second test condition, two consecutive readings were stored in the
same manner, but with an added Sleep() statement after each start and
stop measurement. This simulated a real-time system used in the automation
industry more closely, where the system waits for an input sent from a sensor.
The number of instructions required for reading the PeC is insignicant
compared to the entire test system and algorithm. As a result, it is more
likely that the system will be interrupted when not reading the PeC, which
made this condition more applicable to a real-world application.
On the test platform, the timer was running at a frequency of 3,579,545 Hz,
which gives a resolution of 279 ns. The dierence between the start and stop
time under both system conditions can be seen in Figure 10. Table 2 provides
a summary of the measurement statistics.
(a)
(b)
Figure 10: Measured start-stop time versus measurement number for the
Performance Counter. (a) With Sleep(), (b) Without Sleep().
The test of the PeC without a Sleep() statement showed two discrete
levels, one at 838 ns and another at 1,117 ns, which are equal to 3 and 4
ticks respectively on the PeC. Both these levels probably represent the normal latency introduced of two consecutive time-stamps. Since the processor/performance counter frequency ratio is about 450 to 1, we can assume
49
that the normal latency of two time-stamps is somewhere between 3 and 4
ticks of the PeC.
The second test of the PeC with the added Sleep() statement showed
three discrete levels; the same two as in the previous test, and a third level
with a slightly higher latency than the other two. Even though the PeC
usually gives a low latency for making the time-stamps, the tests showed
maximum values as high as 126.27 µs and 45.26 µs for the rst and second
test, respectively.
Figure 10 and the standard deviation of these tests showed that the PeC
was unsuitable for time measurements in our applications, since samples were
spread over the entire spectrum between 838 ns to 126.27 µs (45.26 µs for
the second test).
9.3.2 Time-Stamp Counter
All processors built on the IA-32 architecture, starting with the Pentium
processor, have a built in TSC. The clock tick frequency of this counter varies
on dierent processor families. On some processors, the counter is increased
at a constant rate determined by the processor conguration, while others
increase the counter with every internal processor clock cycle[14]. In the P6
family (Pentium, Pentium M, Pentium 4, and Xeon) the TSC is implemented
as a 64-bit counter and is guaranteed to not wrap around within 10 years
after being reset[14].
(a)
(b)
Figure 11: Measured start-stop time versus measurement number for the
Time-Stamp Counter, (a) with Sleep(), (b) without Sleep().
The test platform with its 1.6 GHz Pentium M processor has the TSC
implemented as a 64-bit counter, increasing with every processor clock cycle.
50
PeC without
PeC with
TSC without
TSC with
Sleep()
Sleep()
Sleep()
Sleep()
Min
0.84
0.84
0.03
0.03
Mean
1.07
0.99
0.03
0.03
WCET
126.27
45.26
60.56
39.65
Std. dev.
0.37
0.22
0.05
0.02
Table 2: Measured start-stop time in µs for the PeC and TSC.
Since the test was conducted with SpeedStep technology disabled, constant
clock frequency was guaranteed, giving a resolution of 0.625 ns. To compare
the latency of the TSC with that of the performance counter, the two evaluation tests explained in the previous section were conducted on the TSC as
well.
The TSC showed lower latency of two consecutive readings compared to
the performance counter. The dierent test conditions had higher impact
on the results than during the test of the PeC. As seen in Figure 11 the
test without any Sleep() statement have the same latency on almost all
the samples compared to the test condition with the Sleep() statement
added which shows three discrete levels, where the highest of these three
only occurred during the last half of the test. However, the test without a
Sleep() statement got a higher worst case latency and a higher standard
deviation than the other test. Table 2 shows the minimum, mean-value,
maximum, median, and standard deviation (in ns) of all four tests conducted
on both the PeC and the TSC. Even though maximum values of the TSC
were in the same range as the ones of the PeC, only a few of the samples
reached a time higher than 1 µs. The low latency and the fact that it is
unlikely for two consecutive readings to have a higher latency than 1 µs gave
a determinism good enough to be suitable for measurement in our conducted
tests.
9.3.3 Oscilloscope
The ISR latency (the delay between a hardware interrupt and the start of the
ISR execution) is impossible to measure using either the PeC or the TSC,
since the start of the event occurs on the OS level, which the test implementations have no control over. For this reason, an external measurement
approach was also necessary.
An Agilent Inniium 54833D MSO oscilloscope was connected to the parallel port of the motherboard, measuring the voltage on selected pins. To
verify the reliability of this measurement method, a simple test was conducted where a parallel port pin was set (logical 1) and then immediately
unset (logical 0) again. This test was then running in a user-thread of priority
31 (Realtime), and was iterated one million times.
Normally, a user-thread is not allowed to write to port registers because it
51
is a restricted kernel-mode operation and will cause a Privileged Instruction
Exception. Because of this, a third-party solution called AllowIO was used,
which can grant the process full rights to any port[26].
The results of the test showed a maximum jitter of less than 5 µs, with a
maximum execution time of 6.21 µs and an average execution time of 1.37 µs.
This was signicantly more deterministic than using the PeC or TSC, and
the accuracy was sucient for the other tests conducted.
One signicant limitation with the oscilloscope was its inability to save
each individual measurement to a le for later analysis. The oscilloscope
was only capable of calculating the WCET, minimum execution time, mean
execution time, and the standard deviation on the collected data set.
9.4 System Load Conditions
The real-time application tests were conducted using dierent load conditions in order to evaluate the performance impact. Several applications were
developed to realize these load conditions. These applications were developed in Visual Studio .NET. A custom device driver was also developed, to
allow the measurements of ISR/DPC latency and performance, and communication between device drivers and user-processes.
9.4.1 Idle
When the system was idle, no other processes than the ones necessary for XP
to function properly were running. Network was disabled, and the keyboard
and mouse were not used.
9.4.2 CPU Load
In this system load, an simple C application was developed, running a endless
for-loop to utilize 100% of the CPU. The process was running in the Normal
process and thread priority levels.
9.4.3 Graphics Load
A custom application was written in Visual Basic .NET to realize this system
load. It dynamically created many graphical user interface (GUI) controls
and then moved, resized, and changed properties on them. The purpose
was to test how GUI rendering of normal applications aected real-time
performance.
9.4.4 HDD Load
Two large les were copied back and forth on the HDD to determine how
disk activity aects real-time performance. A simple batch script was used
to achieve this load condition.
52
Idle
CPU Load
Graphics Load
Hard Drive Load
Network Load
Stress
User-thread
UserIdle
UserCPU
UserGraphics
UserHDD
UserNetwork
UserStress
DPC
DriverIdle
DriverCPU
DriverGraphics
DriverHDD
DriverNetwork
DriverStress
Prioritized DPC
DriverPrioIdle
DriverPrioCPU
DriverPrioGraphics
DriverPrioHDD
DriverPrioNetwork
DriverPrioStress
Table 3: Test names used throughout the report.
9.4.5 Network Load
In this system load, a batch script was used to transfer large les over a
small local network, connected with a router. In eect, both network and
disk load was measured at the same time. A HP Vectra VL800 running
Filezilla Server version 0.9.12 beta was used as the File Transfer Protocol
(FTP) server. The test platform was running the console-based FTP client
NcFTP version 3.1.9.
9.4.6 Stress
In the Stress mode, all of the above load conditions were running at the same
time, to simulate a worst case scenario of the real-time application.
9.5 Test Names
Table 3 shows the names used to identify the specic tests conducted in the
various load conditions.
9.6 Additional Tests
Aside from the above tests measuring the impact of dierent load conditions,
additional tests were conducted to measure mechanisms such as a process
context switch, the time quanta of a process, etc. These test results are
not presented in the report, as they were only conducted to gain a better
understanding of the primary execution time tests.
All tests listed in Table 3 were conducted using the oscilloscope. However,
since the oscilloscope was incapable of collecting and saving each sample in
the tests, additional tests were conducted using the TSC. This provided a
graphical scatter plot of the measured execution time for every sample and
an execution time distribution of the samples.
The TSC tests measured the algorithm execution time 4,500,000 times
in each test. Because of the limited time available for these additional tests,
they were only conducted on the user-thread implementation.
53
10 Results
10.1 TSC Measurement Results
This section presents the results of the TSC measurements of each userthread test graphically using two diagrams for each test. The rst diagram
shows a scatter plot of the measured execution time in µs for every sample,
while the second one shows the execution time distribution of the samples.
Table 4 shows the minimum, mean, WCET and standard deviation of
the TSC test results.
UserIdle
UserCPU
UserGraphics
UserHDD
UserNetwork
UserStress
Min
36.88
36.89
36.87
36.87
36.88
36.88
Mean
37.27
37.40
37.29
37.40
38.36
39.40
WCET
132.24
107.15
154.74
155.42
144.48
168.87
Std. dev.
0.83
0.72
1.00
2.00
4.58
6.20
Table 4: Algorithm execution time in µs for the TSC tests.
10.1.1 UserIdle
As seen in the time distribution diagram in Figure 12, a majority of the
samples measured 37.42 µs, which is represented by the distinct lowest line
in the scatter plot diagram. This means that the majority of the algorithm
calculations in the test were not interrupted by other tasks.
The remaining samples were distributed in a time spectrum ranging from
around 40 − 120 µs, except for two samples taking 130.55 µs and 132.24 µs,
respectively. It is possible to identify discrete levels in this spectrum, where
samples are more densely grouped. For example, one level exists at 75 µs.
However, because of the black box approach used when testing XP, the reason
why these levels exist is not known.
One interesting note about the UserIdle scatter plot diagram is the change
of characteristics after one third of the test period. The reason could be one
or more device drivers entering a power saving mode. For example, the
HDD might be spinning down after a period of inactivity. The power saving
functionality is part of the WDM development guidelines[2, 25].
10.1.2 UserCPU
The dierence between UserIdle and UserCPU was minimal. A majority of
the samples measure 37.43 µs and the remaining samples were distributed
in the 40 − 120 µs range. The fact that a pure CPU load did not aect the
performance of the test implementation much was not surprising since the
54
(a)
(b)
Figure 12: UserIdle algorithm execution time. (a) Scatter plot, (b) Time
distribution.
test ran in the Realtime priority class, while the CPU load application ran
in the Normal priority class.
10.1.3 UserGraphics
In the UserGraphics test, the sample distribution in the 40 − 120 µs time
range was more dense, which means more algorithm calculations were interrupted compared to UserIdle and UserCPU. However, the WCET sample
of 154.74 µs was not much worse than the WCET for UserIdle, which indicates that the real-time performance of XP is not signicantly aected by
GUI stress.
10.1.4 UserHDD
In the HDD stress load condition, there was a signicant increase of samples
around 50 µs, as seen in Figure 15. Also, the spectrum between 40 − 80
µs was more dense compared to the previous load conditions. However, the
WCET was just 155.42 µs, which was similar to the results in the previous
tests.
As in the UserIdle test, the UserHDD test changed characteristics after
a period of time. In this test, the change occurred after two thirds of the
time, where the samples ranging between 50 − 80 µs suddenly dropped to a
more compact range of 50 − 60 µs. The reason for this is unknown, but as
seen in Figure 15, this does not aect WCET. In fact, the WCET sample
was measured near the end of the test where the scatter plot showed the best
temporal determinism.
55
(a)
(b)
Figure 13: UserCPU algorithm execution time. (a) Scatter plot, (b) Time
distribution.
(a)
(b)
Figure 14: UserGraphics algorithm execution time. (a) Scatter plot, (b)
Time distribution.
56
(a)
(b)
Figure 15: UserHDD algorithm execution time. (a) Scatter plot, (b) Time
distribution.
10.1.5 UserNetwork
The network load condition had a signicant number of samples around 60
µs. Also, the range between 40 − 65 µs was dense. The samples above 65
µs were distributed in a similar way as the UserHDD test, and the WCET
was 144.48 µs.
10.1.6 UserStress
The UserStress test, running all previous load conditions at the same time,
wasperhaps unsurprisinglyhaving the most impact on real-time performance. The density of samples around 60 µs was even higher here compared
to UserNetwork, but the characteristics and time distribution was similar.
However, even in this stressed load condition, no sample exceeded 170
µs. In fact, although not specically designed for it, XP seems to do a good
job keeping the WCET atwhat it seemsa limited level. The amount of
system load applied seems to have a small impact of the measured WCET.
Every one million samples, the test changed characteristics for a short
period of time, as seen in the scatter plot of Figure 17. In actual time, this
was roughly every 15 minutes. The reason for this behavior is not known,
but it did not negatively aect real-time performance.
10.2 Oscilloscope Test Results
The results of the tests conducted using the oscilloscope are briey presented
in this section. For a complete listing of these test results, see Appendix A.
57
(a)
(b)
Figure 16: UserNetwork algorithm execution time. (a) Scatter plot, (b) Time
distribution.
(a)
(b)
Figure 17: UserStress algorithm execution time. (a) Scatter plot, (b) Time
distribution.
58
The oscilloscope test results showed a surprisingly good level of predictability compared to the results of the previous work in the eld of XP
real-time performance[27, 24, 3].
The CPU load conditions (UserCPU, DriverCPU, and DriverPrioCPU)
had a minor impact on the tests. At most, a slightly higher standard deviation was measured, but the WCET were similar to the idle tests (UserIdle,
DriverIdle, and DriverPrioIdle).
Similar to the TSC tests, HDD and network loads had the biggest performance impact after the stress tests.
As expected, the driver implementation had shorter average execution
times and WCET than the user-thread implementation. For example, the
UserStress WCET was 450.89 µs, where the DriverStress and DriverPrioStress
WCETs were 328.30 µs and 356.06 µs, respectively.
One surprising discovery was that the tests with prioritized DPCs had
longer execution times than the normal DPC tests in many cases. Possible
reasons why this was the case are discussed in Section 11.
59
11 Conclusions
A number of observations can be made from the tests conducted. The following list summarizes the most important observations, and the rest of this
chapter is devoted to explaining them in greater detail:
• XP has a better determinism than was reported in the previous work
(Section 7).
• Higher task priority yields better determinism.
• A pure driver implementation is faster and more deterministic than a
user-mode implementation.
• Task interruption can occur anywhere in a full event cycle.
• The dierence in execution time and determinism between a prioritized
and a normal DPC is small.
• The algorithm execution time is slower in kernel-mode compared to
normal user-mode.
• No hard guarantees in terms of WCET can be given.
11.1 Better Determinism Than Reported In Previous Work
When evaluating the test results, a general reection is that the latencies
and WCETs were much more predictable than previously reported in the
eld of XP real-time performance[27, 24, 3]. While [24] reported application
WCETs almost 10000% over the average execution time, our conducted tests
never even generated full event cycle WCETs ten times over the average
execution time. There can be several reasons for this dierence, where the
most probable one is dierent load conditions. It is impossible to pinpoint
the exact reasons, since the conditions under which the tests of the previous
work were conducted are not known.
11.2 Higher Task Priority Yields Better Determinism
It is clear from the results that a higher priority level yields a higher degree
of determinism. The ISR, running at the highest priority level in the tests,
had a maximum latency of 41.82 µs after an interrupt was triggered. This
was measured in the UserFile test. Compared to the mean latency of 11.96
µs, the maximum latency is roughly four times larger.
In the same test, the maximum time between the scheduling of a DPC
and its actual execution is 80.61 µs. Compared to the mean latency of 3.72
µs, the maximum latency is over 20 times larger, which is signicantly larger
than the latency of the ISR. The reason for this, as discussed in Section 4.4,
60
is the fact that a DPC can be interrupted by any interrupt, whereas the
ISR can only be interrupted by higher IRQL interrupts. Since the parallel
port uses IRQL 3 on the test system, the only devices with priority are the
keyboard and the system timer.
11.3 Driver Faster Than User-Mode
A pure driver implementation has a shorter WCET and is more deterministic
than a user-mode implementation communicating with a driver. This came
as no surprise, considering the reduced number of steps required in the driver
compared to the user-mode implementation. Also, the algorithm runs in a
higher priority level (DISPATCH_LEVEL) in the driver implementation, which
reduces the probability of it becoming interrupted.
11.4 Task Interruption Can Occur Anywhere
Every individual step in a full event cycle can be interrupted at any time.
In the test results, it is easy to see that a discrete step, such as the DPC
execution time, has a signicantly higher WCET compared to its average
execution time.
The sum of the WCET of each individual event exceeds the actually
measured WCET for the whole event cycle. This means that, theoretically,
the WCET is higher than measured in the tests. However, the test results
indicate that it is statistically very unlikely that all steps in the chain of
events will be interrupted in a single event cycle.
11.5 Small Dierence Between Normal and Prioritized DPC
The dierence between a prioritized and a normal FIFO DPC in terms of
WCET and average execution time was unexpectedly small. In fact, many
of the driver implementation tests had longer execution times when using
a prioritized DPC. Two possible reasons are considered, where a mixture of
the two may be closer to the actual reason:
1. The DPC queue is never long enough for the priority to make a difference. This reason alone seems unlikely, considering the File and
Network load conditions, both generating many DPCs.
2. The other device drivers loaded in the system also use prioritized DPCs.
This is impossible to know without access to the source code for every
device driver loaded in the system.
The use of prioritized DPCs in a real-time application is therefore not advisable.
61
11.6 Algorithm Slower in Kernel-Mode
The average execution time for the algorithm is approximately 12 % faster
when executing in a normal user-mode thread compared to when executing
in the DPC. This is likely because of dierent levels of code optimization in
the DDK compiler and the Visual Studio 2003 compiler.
11.7 No Guarantees Can Be Given
While the specic tests conducted did not yield any execution times over one
millisecond, no hard guarantees about an absolute WCET can be made. The
tests only prove that, under exactly the load conditions simulated, execution
times exceeding the results were not measured.
However, the tests show that the probability of execution times exceeding
those measured are very unlikely. Thus, this indicates that XP might be
suitable as a soft RTOS under certain controlled conditions.
62
12 Future Work
The results from the tests showed that XP could be suitable as a soft realtime system. However, only the inner workings of XP were evaluated, which
is just part of the suggested target platform by the ABB business unit.
Although the temporal predictability of XP is sucient for some soft realtime systems as it is, dierent techniques to further increase the determinism
should be evaluated to make XP an alternative for systems with even stricter
temporal constraints. The following areas would be of interest to evaluate if
more time was available:
• Use of an Ethernet based protocol for communication
• Modify interrupt handling
• Run the tests on XPE instead of XP
• Evaluate extensions
12.1 Use of an Ethernet Based Protocol for Communication
As mentioned in Section 8, all conducted test used the parallel port to simulate communications from the sensor and to the actuator. In the original
model suggested by the business unit (see Figure 5), the communication
between actuator and sensor could be using an Ethernet stack to decrease
production cost. Evaluating the temporal behavior of the TCP/IP stack using UDP would be of interest. If the temporal predictability of this protocol
stack is not sucient, other Ethernet based protocols could be evaluated
instead.
12.2 Modify Interrupt Handling
To modify interrupt handling in XP, two dierent approaches are suggested
as future work; modify the source code of HAL, or intercept interrupts before
they even reach HAL. As mentioned in Section 4, the source code for HAL
can be delivered from Microsoft with a special agreement. Some simple
modications could possibly increase the determinism enough to make XP
a more suitable alternative for systems with higher temporal constraints.
Interception of interrupts could be done by modify the IDT to pass all
interrupts to a custom interrupt handler routine. A suggested model is
presented in Figure 18.
When an interrupt occurs in this model, it is passed to the customized
interrupt handler. This interrupt handler rst examines the interrupt vector
to determine if the interrupt is intended for the time critical system or not.
If the interrupt was intended for the system (represented by path An in
Figure 18), the custom interrupt sets a ag to mark that an interrupt for
63
1
Custom Interrupt Handler
Win32 Kernel
B2
HAL
I/O Manager
B3
IntendedForTimeCriticalSystem();
MarkAsPending();
QueueInterrupts();
TimeCriticalSystem();
UnmarkPending();
ProcessQueue();
A4
A2
Time Critical System
ReadFromSensor();
Algorithm();
WriteToActuator();
A3
Figure 18: Suggested model for interrupt interception.
the system is pending, queues all incoming interrupts, executes the critical
application, turns o the pending ag, and nally processes the queue of
interrupts. However, if the interrupt was not intended for the time critical
application, the customized interrupt handler simply passes the interrupt to
the HAL, and processing of the interrupt is handled by the XP I/O Manager
as normal (represented by path Bn in Figure 18).
12.3 Run the Tests on XPE
Because of the limited time available for this Master's Thesis, the tests were
conducted on Windows XP Professional instead of XPE. Although the kernel,
thread priorities, scheduling algorithms, and inter-process communication of
XP and XPE are identical, further testing on XPE would be interesting to
see if additional system services not needed for the ABB business unit could
be disabled to improve temporal predictability.
12.4 Evaluate Extensions
This report only had time for a brief overview of the third-party realtime
extensions. Although [32] shows promising results for the evaluated extensions, the number of test conducted are too few to make any real conclusions
64
about the temporal predictability of the extensions. Further analysis of the
available real-time extensions would be subject for future work in this area.
65
A Oscilloscope Test Results
User-thread measurements
All user-thread tests measured the full event cycle as well as selected individual events described in Figure 6. The test results use the same event names
used in Figure 7. All test results are presented in µs.
UserIdle
Min
t0 tGf
t0 tAs
tAf tBs
t0 tDf
tDf tGf
106.72
8.69
1.96
81.67
24.59
Mean
110.89
10.21
2.96
85.38
25.52
WCET
186.76
21.30
62.96
160.55
94.88
Std. dev.
1.68
0.69
0.34
1.51
0.67
Number of samples: 994 210
UserCPU
t0 tGf
t0 tAs
tAf tBs
t0 tDf
tDf tGf
Min
103.57
8.60
2.08
80.41
22.27
Mean
109.28
10.19
2.15
85.04
24.24
WCET
192.55
25.43
66.02
165.71
100.28
Std. dev.
2.46
0.69
0.39
2.18
0.83
Number of samples: 1 064 400
UserGraphics
t0 tGf
t0 tAs
tAf tBs
t0 tDf
tDf tGf
Min
104.52
8.69
2.08
81.74
22.16
Mean
118.25
10.39
2.99
93.60
24.66
WCET
256.30
22.85
85.17
226.59
105.60
Std. dev.
14.70
0.78
0.65
13.47
1.50
Number of samples: 441 060
UserHDD
t0 tGf
t0 tAs
tAf tBs
t0 tDf
tDf tGf
Min
103.05
8.69
2.09
81.16
21.84
Mean
145.05
11.80
3.78
119.14
25.87
WCET
327.89
42.67
163.61
298.27
123.26
Std. dev.
17.79
2.03
1.60
16.41
2.34
Number of samples: 1 339 300
66
UserNetwork
Min
102.99
8.69
2.09
81.05
21.77
t0 tGf
t0 tAs
tAf tBs
t0 tDf
tDf tGf
Mean
130.04
11.27
3.52
104.58
25.46
WCET
402.52
42.43
214.81
358.28
130.16
Std. dev.
11.31
1.48
2.26
10.06
3.98
Number of samples: 1 099 900
UserStress
Min
102.68
8.74
2.08
80.13
21.98
t0 tGf
t0 tAs
tAf tBs
t0 tDf
tDf tGf
Mean
141.32
12.14
3.54
114.55
26.77
WCET
450.89
51.34
243.74
416.50
147.96
Std. dev.
20.04
2.72
4.10
18.14
5.60
Number of samples: 1 267 800
Driver Measurements
All device driver tests measured the full event cycle as well as selected individual events
described in Figure 8. The test results use the same event names used in Figure 9. All
test results are presented in µs.
DriverIdle
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
56.93
6.50
2.06
2.77
44.49
Mean
58.58
7.98
3.05
2.95
44.61
WCET
119.63
18.82
60.54
13.37
59.38
Std. dev.
0.84
0.68
0.34
0.06
0.35
Number of samples: 1 294 000
DriverCPU
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
57.08
6.44
2.07
2.78
44.49
Mean
65.53
7.96
2.12
2.81
52.62
WCET
125.81
18.80
59.73
13.03
67.75
Std. dev.
0.84
0.68
0.33
0.05
0.82
Number of samples: 1 314 900
67
DriverGraphics
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
56.61
6.48
2.06
2.77
44.49
Mean
60.60
8.19
2.92
3.00
46.49
WCET
146.52
21.91
90.39
16.85
71.61
Std. dev.
3.60
0.81
0.64
0.16
3.43
Number of samples: 1 315 300
DriverHDD
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
56.68
6.53
2.07
2.78
44.51
Mean
65.05
9.32
3.51
3.61
48.61
WCET
190.39
26.01
123.43
19.78
84.33
Std. dev.
5.92
1.33
1.50
0.89
3.46
Number of samples: 963 450
DriverNetwork
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
57.05
6.67
2.09
2.94
44.71
Mean
65.21
9.08
3.36
3.45
49.33
WCET
259.72
25.86
182.26
27.14
89.71
Std. dev.
5.67
1.05
2.26
0.64
4.15
Number of samples: 1 353 900
DriverStress
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
64.01
6.49
2.07
2.89
51.23
Mean
72.31
9.49
3.09
3.67
56.06
WCET
328.30
31.37
249.14
26.81
93.01
Std. dev.
7.20
1.55
3.65
1.20
3.05
Number of samples: 1 312 300
68
DriverPrioIdle
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
56.56
6.53
2.08
2.79
44.51
Mean
58.63
8.03
3.05
2.94
44.62
WCET
110.85
17.32
56.18
13.41
59.41
Std. dev.
0.83
0.68
0.28
0.07
0.36
Number of samples: 991 920
DriverPrioCPU
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
64.00
6.44
2.07
2.78
52.54
Mean
65.57
7.99
2.12
2.84
52.62
WCET
121.72
13.98
57.59
13.33
67.80
Std. dev.
0.84
0.68
0.32
0.05
0.35
Number of samples: 906 930
DriverPrioGraphics
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
56.58
6.47
2.06
2.78
44.50
Mean
60.22
8.13
2.91
3.01
46.17
WCET
135.74
17.74
80.38
12.76
71.19
Std. dev.
3.37
0.77
0.58
0.22
3.23
Number of samples: 809 390
DriverPrioHDD
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
56.68
6.56
2.07
2.79
44.51
Mean
65.44
9.38
3.55
4.25
48.26
WCET
223.53
39.20
138.00
31.30
88.63
Std. dev.
6.12
1.35
1.49
1.02
3.50
Number of samples: 3 776 400
69
DriverPrioNetwork
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
56.93
6.73
2.07
2.82
44.73
Mean
68.44
9.21
3.63
3.95
51.65
WCET
297.21
25.15
218.36
24.53
84.95
Std. dev.
7.25
1.29
3.20
0.97
4.97
Number of samples: 990 750
DriverPrioStress
t0 tBf
t0 tAs
tAf tBs
tAs tAf
tBs tBf
Min
64.09
6.52
2.07
2.80
52.51
Mean
72.73
9.42
3.28
3.28
55.82
WCET
356.06
33.63
278.43
28.39
96.11
Std. dev.
7.61
1.59
3.92
1.37
3.16
Number of samples: 998 320
Algorithm Execution Time
The following test was conducted to measure the dierence in execution time of oatingpoint operations. The results are presented in µs.
User-thread
Device driver
Min
39.34
41.67
Mean
39.87
41.76
WCET
105.28
56.40
70
Std. dev.
0.79
0.32
Samples
153 910
116 370
References
[1] Ardence RTX Real-time Extension for Control of Windows.
Ardence.
http://www.ardence.com/assets/5f940542924c4a42b30fc5584872d798.pdf.
[2] A. Baker and J. Lozano. The Windows 2000 Device Driver Book. Prentice Hall PTR,
2001.
[3] A. Baril. Using Windows NT in Real-Time Systems. In Proceedings of the Fifth
IEEE Real-Time Technology and Applications Symposium (RTAS '99), pages 132
141, Washington - Brussels - Tokyo, 1999. IEEE Computer Society.
[4] L. Budin and L. Jelenkovic. Time-Constrained Programming in Windows NT Environment. In Proceedings of the IEEE International Symposium on Industrial Electronics, (ISIE '99), pages 9094, Bled, 1999. IEEE Computer Society.
[5] J. Cinkelj et al. Soft Real-Time Acquisition in Windows XP. In Intelligent Solutions
in Embedded Systems, 2005. Third International Workshop, pages 110116, Bled,
2005. Intelligent Solutions in Embedded Systems.
[6] Comparisons with the .NET Framework.
http://msdn.microsoft.com/library/default.asp?url=/library/enus/dv_evtuv/html/etconcomparisonswithnetframework.asp.
[7] I. Crnkovic and M. Larsson. Building Reliable Component-Based Software Systems.
Artech House, Inc., 2002.
[8] S. Daily. Introducing Windows NT 4.0. 29th Street Press, February 1997.
[9] E. Dekker and J. Newcomer. Developing Windows NT Device Drivers. AddisonWesley, 1999.
[10] Hard Real-Time with Venturcom RTX on Microsoft Windows XP and Windows XP
Embedded. Venturcom, Inc., September 2003.
http://msdn.microsoft.com/library/default.asp?url=/library/enus/dnxpesp1/html/tchHardRealTimeWithVenturcomRTXOnMicrosoftWindowsXPWindowsXPEmbedded.asp.
[11] Hyper-Threading Technology Overview. Intel Corporation.
http://www.intel.com/business/bss/products/hyperthreading/overview.htm.
[12] HyperKernel - Real-time Extensions for Windows NT/2000. Nematron.
http://www.nematron.com/HyperKernel/.
[13] Intel Corporation. IA-32 Intel Architecture Software Developer's Manual Volume 3A:
System Programming Guide, Part 1, January 2006. Order Number: 253668-018.
[14] Intel Corporation. IA-32 Intel Architecture Software Developer's Manual Volume 3B:
System Programming Guide, Part 2, January 2006. Order Number: 253669-018.
[15] INtime. TenAsys. http://www.tenasys.com/intime.html.
[16] INtime 3.0 Real-time Operating System (RTOS) Extension for Windows. TenAsys.
http://www.tenasys.com/resources/getFile.php?leid=6.
71
[17] D. Kresta. Getting Real with NT Approaches to Real-Time Windows NT. Real-Time
Magazine, 2:3235, 1997.
[18] KUKA Controls GmbH - Hard Real-Time Windows XP. KUKA Controls GmbH.
http://www.kuka-control.com/product/.
[19] P. N. Leroux. RTOS versus GPOS: What is best for embedded development? Embedded Computing Design, January 2005.
[20] C. Liu and J. Leyland. Scheduling Algorithms for Multiprogramming in Hard RealTime Environment. Journal of the ACM, 20(1), 1973.
[21] M. Lutz and P. Laplante. C# and the .NET Framework: Ready for Real-Time?
IEEE Software, 20(1):7480, 2003.
[22] Microsoft Windows CE 5.0.
http://msdn.microsoft.com/library/default.asp?url=/library/enus/wceintro5/html/wce50oriWelcomeToWindowsCE.asp.
[23] C. Nordström et al. Robusta realtidssystem. Mälardalen Real-Time Research Centre,
Västerås, August 2000.
[24] K. Obenland, J. Kowalik, T. Frazier, and J. Kim. Comparing the Real-Time Performance of Windows NT to an NT Real-Time Extension. In Proceedings of the
Fifth IEEE Real-Time Technology and Applications Symposium (RTAS '99), pages
142153, Washington - Brussels - Tokyo, 1999. IEEE Computer Society.
[25] W. Oney. Programming the Microsoft Windows Driver Model. Microsoft Press, 1999.
[26] C Peacock. PortTalk - A Windows NT I/O Port Device Driver.
http://www.beyondlogic.org/porttalk/porttalk.htm.
[27] K. Ramamritham et al. Using Windows NT for Real-Time Applications: Experimental Observations and Recommendations. In Proceedings of the Fourth IEEE RealTime Technology and Applications Symposium (RTAS '98), pages 132141, Washington - Brussels - Tokyo, June 1998. IEEE Computer Society.
[28] Real-Time Operating Systems: INtime Architecture.
TenAsys Corporation,
September 2003. http://msdn.microsoft.com/library/default.asp?url=/library/enus/dnxpesp1/html/tchReal-TimeOperatingSystemsINtimeArchitecture.asp.
[29] RTX. Ardence. http://www.ardence.com/embedded/products.aspx?ID=70.
[30] M. E. Russinovich and D. A. Solomon. Microsoft Windows Internals Fourth Edition:
Microsoft Windows Server 2003, Windows XP, and Windows 2000. Microsoft Press,
2005.
[31] A. Tanenbaum. Modern Operating Systems, Second Edition. Prentice Hall International, 2001.
[32] M. Timmerman et al. Designing for Worst Case: The Impact of Real-Time OS
Performance on Real-World Embedded Design. Real-Time Magazine, 3:1119, 1998.
72
[33] M. Timmerman and J-C. Monfret. Designing for Worst Case: The Impact of RealTime OS Performance on Real-World Embedded Design. Real-Time Magazine, 3:52
56, 1997.
[34] M. Timmerman and J-C. Monfret. Windows NT as Real-Time OS? Real-Time
Magazine, 2:613, 1997.
[35] M. Timmerman and J-C. Monfret. Windows NT Real-Time Extensions: an Overview.
Real-Time Magazine, 2:1424, 1997.
[36] Windows Driver Model (WDM). Microsoft Corporation, April 2002.
http://www.microsoft.com/whdc/archive/wdmoverview.mspx.
[37] Windows XP Embedded Home Page. Microsoft Corporation, November 2005.
http://msdn.microsoft.com/embedded/windowsxpembedded/.
[38] P. Work and K. Nguyen. Measure Code Sections Using The Enhanced Timer. Intel
Corporation. http://www.intel.com/cd/ids/developer/asmo-na/eng/209859.htm.
73