0%

0x00 前

早上hr打了两个电话过来。在睡觉,全部没接到,以为自己完蛋了。

半个小时后又打过来,约当天下午四点面试。

0x01 中

开始面试

hr要求自我介绍。

hr问哪里人。

hr问学校信息。

hr要求介绍一下项目经经历,谈一下项目的意义。

hr问有打算读研吗?为什么做出这种选择。

hr要求谈一下Java和c++的区别。

hr问什么时候能入职。

hr要求谈更多的项目细节。

hr问是否还参加了其他公司的面试。如果收到offer去哪家。

hr问有没有认识的亲属在腾讯。

0x02 后

加了微信,做了云证。等offer审批中。

Lecture 5: Job Schedule

What is context switching?

  • Scheduling is the procedure that decides which process to run next.
  • Context switching is the actual switching procedure, from one process to another

Whenever a process goes to blocking / waiting state. E.g., wait()/sleep() is called

  • A POSIX signal arrives (e.g., SIGCHLD)
  • An interrupt arrives (e.g., keystroke(键击,按键))
  • When the OS scheduler says “time’s up!” (e.g., round-robin)
    • Put it back to "ready"
  • When the OS scheduler says “hey, I know you haven’t finished, but the PRESIDENT just arrives, please hold on” (e.g., preemptive(先发制人), round-robin with priority)
    • Put it back to "ready"

Why?

  • For multi-tasking
  • For fully utilize the CPU

CPU Switch From Process A to Process B

  • This is also called a “context switch”
  • Code executed in kernel above is overhead

Context switching

Suppose this process gives up running on the CPU, e.g., calling sleep(). Then:

  • Running --> Interruptible Wait

Now, it is time for the scheduler to choose the next process to run.

But, before the scheduler can seize(抓住) the control of the CPU, a very important step has to be taken:

  • Backup all current context of that process to the kernel-space’s PCB:
    • current register values
    • program counter (which line of code the current program is at)

Say, the scheduler decides to schedule another process in the ready queue. Then, the schedule has to load the context of that process from the main memory to the CPU.

We call the entire operation: context switching.

Context switch is expensive

  • Direct costs in kernel:
    • Save and restore registers, etc.
    • Switch address space (expensive instructions)
  • Indirect costs: cache, buffer cache, & TLB misses

What is process scheduling?

  • Scheduling is an important topic in the research of the operating system.
    • Related theoretical topics are covered in computer system performance evaluation.
  • Scheduling is required because the number of computing resource – the CPU – is limited.
CPU-bound Process I/O-bound process
Spends most of its running time on the CPU, i.e., user-time > sys-time Spends most of its running time on I/O, i.e., sys-time > user-time
Examples - AI course assignments Examples - /bin/ls, networking programs

Classes of process scheduling

Preemptive(先发制人) scheduling (Non-preemptive is out)

  • What is it?
    • When a process is chosen by the scheduler, the process would have the CPU until...
      • the process voluntarily waits I/O, or
      • the process voluntarily releases the CPU, e.g., exit()
      • particular kinds of interrupts (e.g., periodic clock interrupt, a new process steps in) are detected.
  • History
    • In old days, it was called “time-sharing”
    • Nowadays, all systems are time-sharing
  • Pros
    • Good for systems that emphasize interactiveness.
    • Because every task will receive attentions from the CPU
  • Cons
    • Bad for systems that emphasize the time in finishing tasks

Scheduling algorithms

  • Inputs to the algorithms

Online vs Offline

An offline scheduling algorithm assumes that you know the sequence of processes that a scheduler will face

  • Theoretical baseline

An online scheduling algorithm does not have such an assumption.

  • Practical use

Algorithm evaluation

  • Number of context switches
  • Individual & average turnaround time
  • Individual & average waiting time

  • Turnaround time
    • The time between the arrival of the task and the termination of the task.
  • Waiting time
    • The accumulated time that a task has waited for the CPU.

Different algorithms

  • Algorithms
    • Shortest-job-first (SJF)
    • Round-robin (RR)
    • Priority scheduling with multiple queues.
    • … (lab session)

Assumption: context switch is free (in practice, it is expensive)

Non-preemptive SJF

SJF

Problem

  • What if tasks arrive after P2 all have CPU requirement < 3?
  • Problem persists even for its preemptive version

Preemptive SJF

-Whenever a new process arrives at the system, the scheduler steps in and selects the next task based on their remaining CPU requirements.

SJF: Preemptive or not?

The waiting time and the turnaround time decrease at the expense of the increased number of context switches.

Context switch is expensive. (That’s why we shall minimize the # of sys calls as well; on a syscall, the program switch from user-process to kernel-"process".)

Round-robin

Round-Robin (RR) scheduling is preemptive.

  • Every process is given a quantum, or the amount of time allowed to execute.
  • Whenever the quantum of a process is used up (i.e., 0), the process releases the CPU and this is the preemption.
  • Then, the scheduler steps in and it chooses the next process which has a non-zero quantum to run.
  • If all processes in the system have used up the quantum, they will be re-charged to their initial values.
  • Processes are therefore running one-by-one as a circular queue, for the basic version (i.e., no priority)
    • New processes are added to the tail of the ready queue
    • New process arrives won’t trigger a new selection decision

RR VS SJF

So, the RR algorithm gets all the bad! Why do we still need it?

The responsiveness of the processes is great under the RR algorithm. E.g., you won’t feel a job is “frozen” because every job gets the CPU from time to time!

Priority Scheduling

  • A task is given a priority (and is usually an integer).
  • A scheduler selects the next process based on the priority
  • A higher priority process + RR = priority queue + new process arrival triggers a new selection

Static priority

  • Every task is given a fixed priority.
  • The priority is fixed throughout the life of the task.

Dynamic priority

  • Every task is given an initial priority
  • The priority is changing throughout the life of the task.

If a task is preempted in the middle

Note:

  • it has been dequeued
  • Re-enqueue back to the queue
  • Quantum preserved / recharge?
    • Depends
      • Preserved: need more book keeping
      • Recharge: easy (assumed in this course)

Static priority scheduling – an example

Properties: process is assigned a fix priority when they are submitted to the system.

  • E.g., Linux kernel 2.6 has 100 priority classes, [0-99].

  • The highest priority class will be selected.
    • The tasks are usually short-lived, but important;
      • To prevent high-priority tasks from running indefinitely

Lower priority classes will be scheduled only when the upper priority classes has no tasks.

Multiple queue priority scheduling

Definitions

  • It is still a priority scheduler.
  • But, at each priority class, different schedulers may be deployed.
  • The priority can be a mix of static and dynamic.

Real example, the Linux Scheduler.

  • A multiple queue, (kind of) static priority scheduler

Do you notice that real (Linux) scheduler does not rely on task’s CPU (remaining) requirement values?

That is because in practice it’s very difficult to estimate how much CPU time a task needs – CPU is very complicated nowadays:

  • Superscalar
  • Out-of-order execution (OOE)
  • Branch prediction

Recall: Four Fundamental OS Concepts

  • Thread
    • Single unique execution context: fully describes program state
    • Program Counter, Registers, Execution Flags, Stack
  • Address space (with translation)
    • Programs execute in an address space that is distinct from the memory space of the physical machine
  • Process
    • An instance of an executing program is a process consisting of an address space and one or more threads of control
  • Dual mode operation / Protection
    • Only the "system" has the ability to access certain resources
    • The OS and the hardware are protected from user programs and user programs are isolated from one another by controlling the translation from program virtual addresses to machine physical addresses.

Recall: Process (What we knew so far)

  • Process: An instance of an executing program
    • An address space,
    • One or more threads of control
  • Heavyweight process: a process has a single thread of control

  • Two properties of heavyweight process
    • Sequential program execution stream
      • Active part
    • Protected resource
      • Passive part

Modern Process with Threads

  • Thread: a sequential execution stream within process (Sometimes called a "Lightweight process")
    • Process still contains a single Address Space
    • No protection between threads
  • Multithreading: a single program made up of a number of different concurrent activities
    • Sometimes called multitasking, as in Ada …
  • Why separate the concept of a thread from that of a process?
    • Discuss the “thread” part of a process (concurrency, parallelism)
    • Separate from the “address space” (protection)
    • Heavyweight Process  Process with one thread

Single and Multithreaded Processes

  • Threads encapsulate concurrency: "Active" component
  • Address spaces encapsulate protection: "Passive" part
    • Keeps buggy program from trashing the system

Threads in a Process

  • Threads are useful at user-level: parallelism, hide I/O latency, interactivity
  • Option A (early Java): user-level library, one multi-threaded process
    • Library does thread context switch
    • Kernel time slices between processes, e.g., on system call I/O
  • Option B (SunOS, Linux/Unix variants): many single-threaded processes
  • Option C (Windows): scheduler activations
    • Kernel allocates processes to user-level library
    • Thread library implements context switch
    • System call I/O that blocks triggers up-call(向上调用)
  • Option D (Linux, MacOS, Windows): use kernel thread
    • System calls for thread fork, join, exit (and lock, unlock,…)
    • Kernel does context switching
    • Simple, but a lot of transitions between user and kernel mode

Thread State

  • State shared by all threads in process/address space
    • Content of memory (global variables, heap)
    • I/O state (file descriptors, network connections, etc)
  • State "private" to each thread
    • Kept in TCB = Thread Control Block
    • CPU registers (including, program counter)
    • Execution stack – what is this?
  • Execution Stack
    • Parameters, temporary variable
    • Return PCs are kept while called procedures are executing

Shared vs. Per-Thread State

Execution Stack Example

Thread Lifecycle

Multithreaded Processes

  • Multithreaded Processes
  • Thread Control Blocks (TCBs):

  • Switching threads within a block is a simple thread switch
  • Switching threads across blocks requires changes to memory and I/O address tables

Putting it Together: Process

Putting it Together: Processes

  • Switch overhead: high
    • CPU state: low
    • Memory/IO state: high
  • Process creation: high
  • Protection
    • CPU: yes
    • Memory/IO: yes
  • Sharing overhead: high (involves at least a context switch)

Putting it Together: Threads

  • Switch overhead: low (as only CPU state)
  • Thread creation: low
  • Protection
    • CPU: yes
    • Memory/IO: no
  • Sharing overhead: low(thread switch overhead low)

Putting it Together: Multi-Cores

  • Switch overhead: low (only CPU state)
  • Thread creation: low
  • Protection
    • CPU: yes
    • Memory/IO: No
  • Sharing overhead:
    • low (thread switch overhead low, may not need to switch CPU at all!)

Simultaneous MultiThreading/Hyperthreading

  • Hardware technique
    • Superscalar processors can execute multiple instructions that are independent
    • Hyperthreading duplicates register state to make a second “thread,” allowing more instructions to run
  • Can schedule each thread as if were separate CPU
    • But, sub-linear speedup!
  • Original called "Simultaneous Multithreading"

Putting it Together: Hyper-Threading

  • Switch overhead between
    • hardware-threads: very-low (done in hardware)
  • Contention for ALUs/FPUs may hurt performance

Multiprocessing vs Multiprogramming

  • Remember Definitions:
    • Multiprocessing \(\equiv\) Multiple CPUs
    • Multiprogramming \(\equiv\) Multiple Jobs or Processes
    • Multithreading \(\equiv\) Multiple threads per Process
  • What does it mean to run two threads “concurrently”?
    • Scheduler is free to run threads in any order and interleaving: FIFO, Random, …
    • Dispatcher(调度器) can choose to run each thread to completion or time-slice in big chunks or small chunks

User space invoke system calls

E.g.

  • fork()
  • exec*()
  • wait()

Kernel space keep PCB.

When invoking a system call(memory view)

  • When running a program code of a user process.
  • As the code is in use-space memory, so the program counter is pointing to that region.
  • When the process is calling the system call "getpid()".
  • Then, the CPU switches from the user-space to the kernel-space, and reads the PID of the process from the kernel.
  • When the CPU has finished executing the "getpid()" system call
    • It switches back to the user-space memory, and continues running that program code.

When invoking a system call (CPU view)

  • user process
    • user process executing
    • call system call
  • kernel
    • trap mode bit = 0
    • execute system call
    • return mode bit = 1
  • return from system call

Process real time cost(wall-clock time)

User time VS System time

  • With the tool "time"
    • Real-time
    • user time
    • sys time

Accessing hardware costs the process more time.

  • The user time and the sys time together define the performance of an application

  • Function calls cause overhead
    • Stack pushing
  • Sys calls may cause even more
    • Sys call is from another "process" (the kernel)
    • Switching to another "process" -> context switch(will see later)

fork() inside the kernel

  • Inside kernel, processes are arranged as a doubly linked list, called the task list.

fork() in action - array of opened files?

Array of opened files contains:

Array Index Description
0 Standard Input Stream; FILE *stdin;
1 Standard Output stream; FILE *stdout;
2 Standard Error Stream; FILE *stderr;
3 or beyond Storing the files you opened. e.g., fopen(), open(), etc.
  • That's why a parent process shares the same terminal output stream as the child process.

  • Stream is just a logical object for you to read as a sequence of bytes

Working of system calls exec*()

  • exec*()l is called
  • The process returns to user-space but is executing another program

exec*() in action

  • Process 1234 invoked exec*()
  • get in kernel space
    • kernel space will operate the memory of current process:
      • local variable is cleared
      • Dynamically-allocated memory is cleared
      • Global variable is reset based on the new code
      • Code and constants are changed to the new program code.
    • The kernel will also
      • reset the register values(e.g., program counter)

working of system calls wait() and exit()

exit() kernel-view

  • The kernel frees all the allocated memory(I guess in kernel space).
  • The list of opened files are all closed.(So it's okay to skip fclosed()); though not recommended
  • Then, the kernel frees everything on the user-space memory about the concerned process, including program code and allocated memory.
  • Process ID stills in the kernel's process table
    • Why?
      • [Wiki] This entry is still needed to allow the process that started the (now zombie) process to read its exit status.
    • The status of the child is now called zombie ("terminated").
  • Last but no least, the kernel notifies the parent of the child process about the termination of its child.
  • The kernel sends a SIGCHLD signal to the parent.

Summary -- what the kernel does for exit()

  • Step(1) Clean up most of the allocated kernel-space memory (e.g., process's running time info)
  • step(2) Clean up the exit process's user-space memory
  • Step(3) Notify the parent with SIGCHLD.

wait() kernel view's registering signal handling routine

  • By default, every process does not respond to the SIGCHLD
    • i.g., the parent ignores his child unless he is waiting for the child
  • But if a process has called wait()
    • The kernel will register a signal handling routine for that process.(in kernel space)
  • When SIGCHLD comes, the corresponding signal handling routine is invoked!
  • Note: the parent is still inside the wait() system call

  • Default Handler of SIGCHLD registered by the kernel
    • Accept and remove the SIGCHLD signal
    • Destroy the child process in the kernel-space(remove it from process table, task-list, etc.)
  • The kernel
    • Deregisters the signal handling routine for the parent
    • returns the PID of the terminated child as the return value of wait()
  • The parent is ignoring SIGCHLD again

Overall - normal case

Parent's wait() after child's exit()

wait() and exit() short summary

  • exit() system call turns a process into a zombie when
    • The process call exit()
    • The process returns from main()
    • The process terminates abnormally
      • The kernel knows that the process is terminated abnormally. Hence, the kernel invokes exit() for it.
  • wait() and waitpid() are to reap zombie child processes
    • It is a must that you should never leave any zombies in the system
    • wait() and waitpid() pause the caller until
      • A child terminates/stops, OR
      • The caller receives a signal (i.e., the signal interrupted the wait())
  • Linux will label zombie process as "<defunct>"
  • To look for them
1
2

ps aux | grep defunct
1
2
3
4
5
6
7
8
9
10
11
12
13
int main(void)
{
int pid;
if( (pid = fork()) !=0 ) {
printf("Look at the status of the child process %d\n", pid);
while( getchar() != '\n' ); // "enter" here. Child process is defunct/zombie
wait(NULL);
printf("Look again!\n"); // "enter" here. Child process vanish.
while( getchar() != '\n' );
}
return 0;
}

This program requires you to type “enter” twice before the process terminates.

You are expected to see the status of the child process changes (ps aux [PID]) between the 1st and the 2nd “enter"

Calling wait() is important

  • It is not only about process execution suspension…
  • It is about system resource management.
    • A zombie takes up a PID;
    • The total number of PIDs are limited;
    • Read the limit: cat /proc/sys/kernel/pid_max
    • It is 32768.
  • What will happen if we don’t clean up the zombies?

The fork bomb

1
2
3
4
5

int main(void) {
while( fork() );
return 0;
}
  • Deliberately missing wait()
  • Do not try this on department’s machines…

The first process

  • We now focus on the process-related events
    • The kernel, while it is booting up, creates the first process – init.
  • The init process:
    • has PID = 1, and
    • is running the program code /sbin/init.
  • Its first task is to create more processes…
    • Using fork() and **exec*()**.

PRocess blossoming

  • You can view the tree with the command:
    • pstree or
    • pstree -A for the ASCII-character-only display.

Process blossoming... with orphans?

  • However, termination can happen, at any time and in any place…
    • This is no good because an orphan turns the hierarchy from a tree into a forest!
    • Plus, no one would know the termination of the orphan.

Process blossoming…with re-parent

  • In Linux
    • The init process will become the step-mother of all orphans
    • It's called re-parenting
  • In Windows
    • It maintains a forest-like process hierarchy......

Re-parenting example

1
2
3
4
5
6
7
8
9
10
11
12
13
int main(void) {
int i;
if(fork() == 0) {
for(i = 0; i < 5; i++) {
printf("(%d) parent's PID = %d\n",
getpid(), getppid() );
sleep(1);
}
}
else
sleep(1);
printf("(%d) bye.\n", getpid());
}

Output:

1
2
3
4
5
6
7
8
$ ./reparent
(1235) parent's PID = 1234
(1235) parent's PID = 1234
(1234) bye.
$ (1235) parent’s PID = 1
(1235) parent’s PID = 1
(1235) parent’s PID = 1
(1235) bye.

What had happened during re-parenting?

Background jobs

  • The re-parenting operation enables something called background jobs in Linux
    • It allows a process runs without a parent terminal/shell
1
2
3
$ ./infinite_loop &
$ exit
[ The shell is gone ]
1
2
3
4
5
6
$ ps –C infinite_loop

PID TTY
1234 ... ./infinite_loop

$ _

Process lifecycle

Process lifecycle - Ready

Process lifecycle - Running

Process lifecycle - Blocking

Process lifecycle - Interruptible wait

  • Example. Reading a file.
    • Sometimes, the process has to wait for the response from the device and, therefore, it is blocked
      • this blocking state is interruptible
      • E.g., “Ctrl + C” can get the process out of the waiting state (but goes to termination state instead).

Process lifecycle - Un-Interruptible wait

Sometimes, a process needs to wait for a resource until it really gets what it wants

Process lifecycle - return back to ready

Process lifecycle - going to die

0x00 准备

今天一整天没课,为面试准备了一整天。前前后后做了几千字的笔记。

0x01 面试

晚上7点开始。先自我介绍。

面试官问:选出一个能介绍你自己的项目。

我选了游戏项目。

面试官:我们部门主要是做基础架构的,你要是想做游戏我可以帮你推给其他部门。

我:立刻改口。

面试官问其他项目,提到C/C++。

面试官要求设计一个接口。基于C/C++,输入两个int型的变量,输出它们的乘积。

面试官问网络相关项目。

面试官问:tcp和udp能同时用一个端口吗。

面试官问:什么时候能来上班。

面试官问:老家什么地方的。

面试官问:打算考研吗?

面试官问:将来有什么打算?

0x02 反问环节

我:主要业务是什么?

我:技术水平如何?

我:希望什么时候能入职?

我:听说腾讯工作比其他大厂轻松,真的吗?

我:如果过了,还有几轮面试?

我:有什么需要学习的地方?

我:面试中有哪些不足?

0x03 后记

满打满算只问了二十分钟,没怎么问技术问题。面试总时长三十分钟。

10分钟出结果,官网显示复试过了,到hr面。大概teg真缺人。

0x00 起因

为了让数学公式在博客上被成功的渲染,我找了半天的教程。最后按照这篇next主题这篇官方指导来解决了公式渲染问题。

具体思路如下。

  1. 在 next/config.yml 文件中修改如下。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Math Formulas Render Support
math:
# Default (false) will load mathjax / katex script on demand.
# That is it only render those page which has `mathjax: true` in front-matter.
# If you set it to true, it will load mathjax / katex srcipt EVERY PAGE.
every_page: false

mathjax:
enable: true
# Available values: none | ams | all
tags: none

katex:
enable: false
# See: https://github.com/KaTeX/KaTeX/tree/master/contrib/copy-tex
copy_tex: false
  1. 删除原有的hexo-renderer-marked包,安装hexo-renderer-pandoc包。
1
2
3
4
$ npm un hexo-renderer-marked
- no output
$ npm i hexo-renderer-pandoc
- no output
  1. 安装 pandoc

0x01 问题

数学公式渲染得非常完美,但是markdown文档的无序列表出了问题。

起因在于我使用vscode编辑markdown文档,markdown使用两个空格来表示无序列表和子列表的缩进。

markdown格式并没有非常严格统一的标准,各家都有自己小小的不同。pandoc则默认使用4个空格来做列表和子列表的缩进。

0x02 解决

在网上查了一下,五年前就有人在pandoc项目下提出了相关的issue。程序员们为应该使用4个空格还是2个空格进行缩进吵得不亦乐乎。

issue串中有人提出了解决方案

Sorry about the necro post.

Using --tab-stop 2 in the command line will allow you to use 2 spaces for nesting lists. Not sure of this is a deprecated or undocumented feature, works in 1.17.0.2-win.

也就是说pandoc提供了 --tab-stop 2 这一标签来确定这类缩进的空格数。我们可以轻易的猜想到hexo使用了 hexo-renderer-pandoc这一npm包来调用pandoc以渲染markdown。

找到调用的js源代码并修改。

目录:myblog/node_modules/hexo-renderer-pandoc/index.js

源代码为:

1
var args = [ '-f', 'markdown-smart'+extensions, '-t', 'html-smart', math]

修改为:

1
var args = [ '-f', 'markdown'+extensions, '-t', 'html', math, '--tab-stop', '2']

其中,去掉两个smart是解决另一个bug。不在此赘述。

week5_Note

Project

Rules for submitting pull requests

Project Rules: What not to do when making pull request(PR)?

  • Need to change the code to resolve conflicts so that developers can merge cleanly
  • No communication with the user and developer. Only fixing the way you like.
  • Do not choose documentation related issue.
  • Do not adding too many commits to your PR. You PR should be short.
  • Make minimal changes to current code. A PR that change less lines will be more likely to be accepted!
  • Hit-and run PR: commits and run away without responding. This is very bad practice because:
    • Waste time of developer in code review
    • Developers need to correct you mistake because you didn;t even check if you are implementing the correct issue
    • Developer mention you several times but do not respond. Leave a bad reputation in GitHub.

Recap: Failure and errors

  • Why is the difference between failure and errors?
    • 计算、观察或测量值或条件,与真实、规定或理论上正确的值或条件之间的差异(Discrepancy between a computed, observed or measured value or condition and the true, specified, or theoretically correct value or condition.),可译为“错误”。Error是能够导致系统出现Failure的系统内部状态。
    • Failure:当一个系统不能执行所要求的功能时,即为Failure,可译为“失效”。(Termination of the ability of an element or an item to perform a function as required.)

Test Driven Development(TDD)

One of the core practices in XP

Kent Beck's rules

  • Beck's concept of test-driven development centers on two basic rules.
  • Eliminate duplication.

Ambiguity in Informal

If the maintenance contract number provided by the customer is not valid?

  • Contract number cannot contain alphabets or special characters
  • Contract number must be 5 digits
  • Contract number cannot start with 0?

Steps in Test Driven Development (TDD)

  • The iterative process
    • Quickly add a test.
    • Run all tests and see the new one fail.
    • Make a little change to code.
    • Run all tests and see them all succeed.
    • Refactor to remove duplication

Which example has better tests?

  • Each test should be independent of each other
  • Any given behaviour should be specified in one and only
  • Correct method signature should be assertEquals(expected,actual)
  • Use .equals() to compare strings

Is there a standard measurement for test quality

Yes, code coverage!

What is Code Coverage

  • Code coverage is a measure used to describe the degree to which the source code of a program is executed when a particular test suite runs(A form of dynamic analysis)
  • Code Coverage is classified as a White box testing
    • White Box testing: Testing where internal structure/ design/ implementation of the item being tested is known

Benefits of Code Coverage

  • Identify untested part of codebase
  • Improve the quality by imporved test coverage
  • Identify testing gaps or missing tests
  • Identify the redundant/dead code

Coverage Criteria

To measure what percentage of code has been exercised by a test suite, one or more coverage

  • Instructions Coverage
    • Method’s bytecode stream is a sequence of instructions for JVM
    • The Bytecode for a method are executed when that method is invoked.
  • Statements Coverage
    • Reports whether each executable statement was executed
  • Branch Coverage
    • Reports whether Boolean expressions evaluate to true
  • Method Coverage
    • Reports whether a method (function) was invoked while testing the application
  • Class Coverage
    • Report of number of classes from the code base covered.

Equation for Computing Coverage

\[\text{Statement Coverage} = \frac{\text{Number of executed statements}}{\text{Total number of statements}} \times 100\]

\[ \text{Branch Coverage} = \frac{\text{Number of Executed Branches}}{\text{Total number of Branches}} \times 100 \]

Code Coverage Analysis Process

  • Writing test cases and execute them
  • Finding areas of code not covered using Code Coverage Tool
  • Creating additional tests for identified gaps to increase test coverage
  • Determining a quantitative measure of code coverage

Code Coverage using JaCoCo

  • JaCoCo is an open source code coverage Tool for Java, which has been created by the EclEmma team
    • Configure JaCoCo agent with JVM of your system to instrument java classes
    • .EXEC file gets generated while the test cases are executed on the sys
    • Generate Code Coverage report using ant (in different format)

Mutation Testing

General View

We are performing muttion analysis whenever we

  • use well defined rules(mutation operators)
  • defined on syntactic description(grammars)
  • to make systematic changes(Applied universally or according to empirically verified distribution)
  • to the syntax or to objects developed from the syntax

Why Mutation?

  • Mutant processes are created to try to mimic typical syntactic errors made by program
  • Many differing mutants are run against the specified tests to assess the quality of the tests
  • The tests are attributed a score between 0 and 1, as to whether they can distinguish between the original and the mutants.

Mutators

Mutators are patterns applied to source code to produce mutations

Lecture 3: Process I

What is a process?

  • Process is a program in execution
  • It contains every accounting information of that running program e.g.
    • Current program counter
    • Accumulated running time
    • The list of files that are currently opened by that program
    • The page table
  • Important concept: Process control block

What is a process

1
2
3
$ ls | cat | cat
[ctrl + C]
$
  • The command involves three processes
  • It will stop early if I send a signal to interrupt it
  • Its progress is determined by the scheduler
  • The three processes cooperate to give useful output

What are those two "cats"?

  • 2 different processes using the same code "/bin/cat"

Our Roadmap

  1. How to distinguish the two cats?
  2. Who (and hwo to) create the processes?
  3. Which should run first?
  4. What are those pipes?
  5. What if "ls" is feeding data too fast? ill the "cat" feels full and dies?

Process identification

  • How can we identify processes from one to another?
    • Each process is given an unique ID number, and is called the processes ID, or the PID
    • The system call, getpid(), prints the PID of the calling process.

Process creation

  • To create a process, we use the system call fork().

Process creation - fork() system call

  • So, how do fork() and the processes behave?
  • What do we know?
    • Both the parent and the child execute the same program
    • The child process starts its execution at the location that fork() is returned, not from the beginning of the program

Let there be only ONE CPU

  • Only one process is allowed to be executed at one time
  • However, we can't predict which process will be chosen by the OS
  • That is controlled by the OS's scheduler

IMPORTANT: For child, the return value of fork() is zero

关于fork如何实现形式上返回两个值,详见此文章

fork() behaves like "cell division"

  • It creates the child process by cloning from the parent process, including all user-space data e.g.
Cloned items Descriptions
Program counter[CPU register] That's why they both execute from the same line of code after fork() returns
Program code[file & memory] They are sharing the same piece of code.
Memory Including local variables, global variables, and dynamically allocated memory
Opened files[Kernel's internal] If the parent has opened a file "A", then the child will also have file "A" opened automatically.
  • However
    • fork() does not clone the following
    • Note: PCB is in the kernel space
Distinct items Parent Child
Return value of fork() PID of the child process 0
PID Unchanged Different, not necessarily be "Parent PID + 1"
Parent process Unchanged Parent
Running time Cumulated Just created, so should be 0
[Advanced] File locks Unchanged None

fork() can only duplicate

  • If a process can only duplicate itself and always runs the same program, it's not quite meaningful
    • how can we execute other programs?
  • We want CHANGE!
    • Meet the exec*() system call family

exec

  • execl() - member of the exec system call family(and the family has 6 members)
1
2
3
4
5
6
7
8
9
10
int main(void) {

printf("before execl ...\n");
execl("/bin/ls", "/bin/ls", NULL);
printf("after execl ...\n");

return 0;

}

Arguments of the execl() call

  • 1st argument: the program name, “/bin/ls” in the example.
  • 2nd argument: argument[0] to the program.
  • 3rd argument: argument[1] to the program.

exec

  • The process is changing the code that is executing and never returns to the original code
    • The last two lines of codes are therefore not executed
  • The process that call an exec* system call will replace user-space info e.g.,
    • Program Code
    • Memory: local variables, global variables, and dynamically allocated memory
    • Register value: e.g.m the program counter
  • But, the kernel-space info of that process is preserved, including:
    • PID
    • Process relationship
    • etc.

When fork() meets exec*()

  • To implement the core part of a shell
  • To implement the C library call system()

fork() + exec*() = system()

  • It si very weird to allow different execution orders
  • How to let tht child to execute first?
    • But... we can't control the OS scheduler
  • Then, our problem becomes...
    • How to suspend the execution of the parent process?
    • How to wake the parent up after the child is terminated?

fork() + exec*(l + wait() = system()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int system_ver_CS302(const char *cmd_str) {
if(cmd_str == -1)
return -1;
if(fork() == 0) {
execl("/bin/sh", "/bin/sh", "-c", cmd_str, NULL);
fprintf(stderr, "%s: command not found\n", cmd_str);
exit(-1);
}
wait(NULL);
return 0;
}

int main(void) {

printf("before...\n\n");
system_ver_CS302("/bin/ls");
printf("\nafter...\n");
return 0;

}

Process Life Cycle(user-space)

wait() - user-space

wait() system call

  • suspend the calling process to waiting state and return (wakes up) when
    • one of its child processes changes from
      • running to terminated
    • Or a signal is received(will cover)
  • return immediately(i.e., does nothing) fi
    • It has no children
      • Or a child terminates before the parent calls wait for

wait() vs waitpid()

wait()

  • wait for any one of the children
  • Detect child termination

waitpid()

  • depending on the parameters, waitpid() will wait for a particular child only
  • Depending on the parameters, waitpid() multiple child's status change

summary

  • A new process is created by fork()
    • Who is the first process
  • A process is a program being brought by exec to the memory
    • has state(initial state = ready)
    • waiting for the OS to schedule the CPU to run it
  • Can a process execute more than one program?
    • Yes, keeps on calling the exec system call family
  • You now know how system() C library call is implemented by syscalls fork(), exec(), and wait()

exec*() – arguments explained

  • Environment variables
  • A set of strings maintained by the shell.
1
2
3
4
5
6
int main(int argc, char **argv, char **envp) {
int i;
for(i = 0; envp[i]; i++)
printf("%s\n", envp[i]);
return 0;
}

The “**envp” variable is an array of string

A string is an array of characters

  • Environment variables
    • A set of strings maintained by the shell.
    • Quite a number of programs will read and make use of the environment variable.
Variable name Description
SHELL The path to the shell that you're using
PWD The full path to the directory that you’re currently on.
HOME The full path to your home directory
USER Your login name.
EDITOR Your default text editor.
PRINTER Your default printer

Lecture 2 Fundamental OS Concepts

Our roadmap

  • Computer organization revision
  • Kernel data structures in OS
  • OS history
  • Four fundamental OS concepts
    • Thread
    • Address space(with translation)
    • Process
    • Dual mode operation/ protection

Four Components of a computer system

  • Computer hardware
  • Operating system
  • system and application programs
  • user

Computer System Organization

  • Computer-system operation
    • One or more CPUs, device controllers connect through common bus providing access to shared memory
    • Concurrent execution of CPUs and devices competing for memory cycles

Kernel Data Structures

  • Many similar to standard programming data structures
    • Singly linked list
    • Doubly linked list
    • Circular linked list
  • Binary search tree left <= right
    • Search performance is O(n)
    • Balanced binary search tree is O(lgn)
  • Hash function can create a hash map
  • Bitmap - string of n binary digits representing the status of n items
  • Linux data structures defined in include files <linux/list.h>,<linux/kfifo.h>,<linux/rbtree.h>

Four Fundamental OS Concepts

  • Thread
    • Single unique execution context: fully describes program state
    • Program Counter, Register, Execution Flags, Stack
  • Address space(with translation)
    • Programs execute in an address space that is distinct from the memory space of the physical machine
  • Process
    • An instance of an executing program is "a progress consisting of an address and one or more threads of control"
  • Dual mode operation / Protection
    • Only the "system" has the ability to access certain resources
    • The OS and the hardware are protected from user programs and user programs are isolated from one another by "controlling the translation" from program virtual addresses to machine physical addresses.

OS Bottom Line: Run Programs

  • Load instruction and data segments of executable file into memory
  • Create stack andd heap
  • Transfer control to program
  • Provide services to program
  • While protecting OS and program

First OS Concept: Thread of Control

  • Certain registers hold the context of thread
    • Stack pointer holds the address of the top of stack
      • Other conventions: Frame pointer, Heap pointer, Data
    • May be defined by the instruction set architecture or by compiler conventions
  • Thread; Single unique execution context
    • Program Counter, Register, Execution Flag, Stack
  • A thread is executing on a processor when it is resident in the processor registers
  • PC register holds the address of executing instruction in the thread
  • Registers hold the root state fo the tread
    • The rest is "in memory"

Second OS Concept: Program's address space

  • Address space: the set of accessible addresses + state associated with them
  • For a 32-bit processor there are 2^32 = 4 billion address
  • What happens when you read or write to an address?
    • Perhaps nothing
    • Perhaps acs like regular memory
    • Perhaps ignore writes
    • Perhaps causes I/O operation
    • Perhaps causes exception(fault)

Multiprogramming - Multiple Threads of Control

How can we give the illusion of multiple processors?

  • Assume a single processor: How do we provide the illusion of multiple processors?
    • Multiplex in time!
  • Each virtual "CPU" needs a structure to hold:
    • Program Counter(PC), Stack Pointer(SP)
    • Registers(Integer, Floating point, others)
  • How switch from one virtual CPU to the next
    • Save PC, SP, and registers in current state block
    • Load PC, SP, and registers from new state block
  • what triggers switch?
    • Timer, voluntary yield, I/O, other things

The Basic Problem of Concurrency

  • The basic problem of concurrency involves resources:
    • Hardware: single CPU, single DRAM, single I/O devices
    • Multiprogramming API: processes think they have exclusive(独属的 ) access to shared resources
  • OS has to coordinate all activities
    • Multiple processes, I/O interrupts, ...
    • How can it keep all these things straight?
  • Basic Idea: Use Virtual Machine abstraction
    • Simple machine abstraction for processes
    • Multiplex these abstract machine
  • Dra did this for "THE system"
    • Few thousand lines vs 1 million lines in OS 360(1K bugs)

Properties of this simple multiprogramming technique

  • All virtual CPUs share same non-CPU resources
    • I/O devices the same
    • Memory the same
  • Consequence of sharing:
    • Each thread can access the data of every other thread(good for sharing, bad for protection)
    • Threads can share instructions(good for sharing, bad for protection)
    • Can threads overwrite OS functions?
  • This (unprotected) model is common in:
    • Embedded application
    • Windows 3.1 / Early Macintosh(switch only with yield)
    • windows 95 - ME(switch with both yield and timer)

Protection

  • Operating System must protect itself from user programs
    • Reliability: compromising the operating system generally causes it to crash
    • Security: limit the scope of what processes can do
    • Privacy: limit each process to the data it is permitted to access
    • Fairness: each should be limited to its appropriate share of system resources(CPU time, memory, I/O, etc)
  • It must protect User programs from one another
  • Primary Mechanism: limit the translation from program address space to physical memory space
    • Can only touch what is mapped into process "address space"
  • Additional Mechanisms:
    • Privileged instructions, in/out instructions, special register
    • syscall processing, subsystem implementation
      • e.g., file access rights, etc

Third OS Concept: Process

  • Process: execution environment with Restricted Rights
    • Address Space with One or More Threads
    • Owns memory(address space)
    • Owns file descriptors, file system context, ...
    • Encapsulate one or more threads sharing process resources
  • Why processes?
    • Protected from each other!
    • OS Protected from them
    • Processes provides memory protection
    • Threads more efficient than processes(later)
  • Fundamental tradeoff between protection andd efficiency
    • Communication easier within a process
    • Communication harder between processes
  • Application instance consists of one or more processes

Single and Multithreaded Processes

  • Threads encapsulate concurrency:"Active" component
  • Address spaces encapsulate protection: "Passive" part
    • Keeps buggy program from trashing the system
  • Why have multiple threads per address space?

FOurth OS Concept: Dual Mode Operation

  • Hardware provides at least two modes:
    • "Kernel" mode(or "supervisor" or "protected")
    • "User" mode: Normal programs executed
  • What is needed in the hardware to support "dual mode" operation?
    • A bit of state (user/system mode bit)
    • Certain operations/ actions only permitted in system/kernel mode
    • Certain operations / actions only permitted in system/kernel mode
      • In user mode they fail or trap
  • User -> Kernel transition sets system mode AND saves the user PC
    • Operating system code carefully puts aside user state then performs the necessary operations
  • Kernel -> User transition clears system mode AND restores appropriate user PC
    • return-from-interrupt

Unix System Structure

  • User Mode
    • Applications
    • Standard Libs :
      • shells and commands
      • compilers and interpreters
      • system libraries
  • Kernel Mode
    • system-call interface to the kernel
    • signals terminal handling
    • character I/O system
    • terminal drivers
    • file system
    • swapping block I/O system
    • disk adn tape drivers
    • CPU scheduling
    • page replacement
    • demand paging
    • virtual memory
    • kernel interface to the hardware
  • Hardware
    • terminal controllers
    • terminals
    • device controllers
    • disks and tapes
    • memory controllers
    • physical memory

Simple Protection: Base and bound(B&B)

  • Requires relocating loader
  • Still protects OS and isolates program
  • No addition on address path

Another idea: Address Space Translation

  • Program operates in an address space that isn distinct from the physical memory space of the machine

A simple address translation with Base and Bound

  • Can the program touch OS?
  • Can it touch other programs?

Tying it together: Simple B&B: OS loads process

Simple B&B: OS gets ready to execute process

  • Privileged Inst: set special registers
  • RTU

Simple B&B: User Code Running

3 types of Mode Transfer

  • Syscall
    • Process requests a system service, e.g., exit
    • Like a function call, but "outside" the process
    • Does not have the address of the system function to call
    • Like a Remote Procedure Call(RPC) - for later
    • Marshall the syscall id and args in registers and exec syscall
  • Interrupt
    • External asynchronous event triggers context switch
    • e.g., Timer, I/O device
    • Independent of user process
  • Trap or Exception
    • Internal synchronous event in process triggers context switch
    • el.g., Protection violation(segmentation fault), Divide by zero,...
  • All 3 are UNPROGRAMMED CONTROL TRANSFER
    • Where does it go?

How do we get the system target address of the "unprogrammed control transfer?"

Interrupt Vector

  • Address and properties of each interrupt handler

Simple B&B: User => Kernel

  • How to return to system?j

Simple B&B: Interrupt

  • How to save registers and set up system stack?

Simple B&B: Switch User Process

Simple B&B: "resume"

Dual Mode Operation

Conclusion: Four fundamental OS concepts

  • Thread
  • Address Space with Translation
  • Process
  • Dual Mode operation/Protection

Week4_note

高尔吉亚 (Gorgias 公元前480 - 前370)

  • 三个命题
    • 无物存在
    • 如果有物存在,人也无法认识
    • 即使可以认识,也无法把它告诉别人
  • 解读:
    • 第一,如果有物存在,怎么证明它真实存在?
    • 第二,如果物可以认识,怎么证明对物的认识就是它本来的认识。
    • 第三,如果可以将物的表达结果告诉别人,怎么证明语言表达的就是思想认识的内容?

感觉与事物的真实状态是不一致的

感觉“不是按照真理”

智者运动的历史地位

全部智者学派运动的重大价值在于:激发了人们的思想,要求哲学,宗教,习俗,道德以及建立在他们之上的制度来辨明自己的合理性。智者否认认识的可能性,那就使有人有必要说明能够认识的代理。他们迫使哲学寻求认识的标准。他们评级传统道德,迫使道德反对怀疑主义和虚无主义来保护产出,找出是非的合理的原则。他们评级传统的宗教信仰,迫使思想家认为有必要找出更圆满和纯粹的神的观念。

第二阶段,苏格拉底哲学的形成

(苏格拉底, Socrates, 公元前469 - 前399年)

  • 述而不作
  • 没有任何著作,其思想是通过”对话“报答,西方思想史上最伟大的哲学家之一。
  • 长得很丑(?)

2021腾讯后台开发暑期实习面经 3月10日

8日投的简历,10日下午收到腾讯面试官的电话,预约当晚7点电话面试。非常高效。

面试过程

晚7点面试官打来电话。先要求做自我介绍。

看到简历上C++相关项目,问是否用过STL库。答用过一些,提到了unordered_map。

面试官开始问unordered_map的底层细节。答不熟悉C++底层,学校使用Java入门。

Java

面试官转向Java,问Hashmap的底层实现和数据结构。

  • 问Hashmap的增删查改过程。

  • 问Hashmap如何实现散列均匀。

  • 问Hashmap为什么要用红黑树。

  • 问Hashmap中数组过大怎么办。结合项目中稀疏矩阵提问。

  • 问红黑树基本原理。

  • 问AVL树原理。问AVL数如何保持平衡。问AVL数旋转过程。

  • 问哈希函数原理,字符串哈希方法。

  • 问Hashmap的哈希函数实现。

  • 问Hashmap的扩容机制。

  • 问需要存放大量数据是否适合使用Hashmap。

  • 问使用Hashmap存放学校所有人信息,用姓名作为Key应该怎么存。有重名怎么办。

操作系统

问操作系统,答刚开始学,还没学完。遂没问操作系统的其他知识。

网络

问计算机网络,先问是否了解http。

问TCP和UDP的区别。

问DNS请求过程。

问三次握手和四次挥手。

问在浏览器地址栏输入地址后敲下回车,发生了什么。

问不考虑机器运行时间,浏览器发起一个http请求后收到回应的第一个字符需要几个RTT。

问是否了解https。讲一下所知的。

MISC

问十亿整数,如何查找前1000个大的数。(Top K 问题)

  • 问如何用字节流读取大文件。
  • 问堆的数据结构实现。
  • 问快速选择方法的具体思路。

问十亿整数,如何查找中位数。

问十亿总数,如何统计出现频率最高的K个数字。

手撕代码

1
2
3
4
5
6
7
// 将ip地址转换为点分16进制字符串
// e.g. 0 -> 0.0.0.0

public String convert(int ip) {
// code here
return null;
}

反问环节

问面试官是什么事业群的。答TEG。

问大概还有几轮面试,什么时候进行下一轮。答过了的话几天内会有人联系,大约还有两三轮。

问有何建设性意见。答没有特别的意见。

没有问到的内容

数据库

设计模式

简历项目