进程和线程之间有什么根本性的区别？

Linux，我总感觉线程是进程的进化版。

热心的回应 · 2019-6-1 20:17:19

Linus的这封邮件（Linux-Kernel Archive: Re: proc fs and shared pids）是我看到的对线程和进程的最好的解释。

On Mon, 5 Aug 1996, Peter P. Eiserloh wrote:
>
> We need to keep a clear the concept of threads. Too many people
> seem to confuse a thread with a process. The following discussion
> does not reflect the current state of linux, but rather is an
> attempt to stay at a high level discussion.
NO!

There is NO reason to think that "threads" and "processes" are separate
entities. That's how it's traditionally done, but I personally think it's a
major mistake to think that way. The only reason to think that way is
historical baggage.

Both threads and processes are really just one thing: a "context of
execution". Trying to artificially distinguish different cases is just
self-limiting.

A "context of execution", hereby called COE, is just the conglomerate of
all the state of that COE. That state includes things like CPU state
(registers etc), MMU state (page mappings), permission state (uid, gid)
and various "communication states" (open files, signal handlers etc).

Traditionally, the difference between a "thread" and a "process" has been
mainly that a threads has CPU state (+ possibly some other minimal state),
while all the other context comes from the process. However, that's just
_one_ way of dividing up the total state of the COE, and there is nothing
that says that it's the right way to do it. Limiting yourself to that kind of
image is just plain stupid.

The way Linux thinks about this (and the way I want things to work) is that
there _is_ no such thing as a "process" or a "thread". There is only the
totality of the COE (called "task" by Linux). Different COE's can share parts
of their context with each other, and one _subset_ of that sharing is the
traditional "thread"/"process" setup, but that should really be seen as ONLY
a subset (it's an important subset, but that importance comes not from
design, but from standards: we obviusly want to run standards-conforming
threads programs on top of Linux too).

In short: do NOT design around the thread/process way of thinking. The
kernel should be designed around the COE way of thinking, and then the
pthreads _library_ can export the limited pthreads interface to users who
want to use that way of looking at COE's.

Just as an example of what becomes possible when you think COE as opposed
to thread/process:

- You can do a external "cd" program, something that is traditionally
impossible in UNIX and/or process/thread (silly example, but the idea
is that you can have these kinds of "modules" that aren't limited to
the traditional UNIX/threads setup). Do a:

clone(CLONE_VM|CLONE_FS);
child: execve("external-cd");
/* the "execve()" will disassociate the VM, so the only reason we
used CLONE_VM was to make the act of cloning faster */

- You can do "vfork()" naturally (it meeds minimal kernel support, but
that support fits the CUA way of thinking perfectly):

clone(CLONE_VM);
child: continue to run, eventually execve()
mother: wait for execve

- you can do external "IO deamons":

clone(CLONE_FILES);
child: open file descriptors etc
mother: use the fd's the child opened and vv.

All of the above work because you aren't tied to the thread/process way of
thinking. Think of a web server for example, where the CGI scripts are done
as "threads of execution". You can't do that with traditional threads,
because traditional threads always have to share the whole address space, so
you'd have to link in everything you ever wanted to do in the web server
itself (a "thread" can't run another executable).

Thinking of this as a "context of execution" problem instead, your tasks can
now chose to execute external programs (= separate the address space from the
parent) etc if they want to, or they can for example share everything with
the parent _except_ for the file descriptors (so that the sub-"threads" can
open lots of files without the parent needing to worry about them: they close
automatically when the sub-"thread" exits, and it doesn't use up fd's in the
parent).

Think of a threaded "inetd", for example. You want low overhead fork+exec, so
with the Linux way you can instead of using a "fork()" you write a
multi-threaded inetd where each thread is created with just CLONE_VM (share
address space, but don't share file descriptors etc). Then the child can
execve if it was a external service (rlogind, for example), or maybe it was
one of the internal inetd services (echo, timeofday) in which case it just
does it's thing and exits.

You can't do that with "thread"/"process".

Linus

热心的回应 · 2019-6-1 20:17:20

首先我假定你问的是操作系统级别的线程而不是用户态的线程（就不扯python、go之类的了）
教科书式的回答：
操作系统中，通常都是把进程作为分配资源的基本单位，而把线程作为独立运行和独立调度的基本单位
这句话是什么意思呢
1. 进程作为分配资源的基本单位
什么是资源呢，就是内存，文件，socket等，比如你new了一块内存，就是操作系统将一块物理内存映射到你的进程地址空间上，这块内存就是属于你这个进程的，你进程内的所有线程都可以访问这块内存，其他进程就访问不了。其他类型的资源也是同理。所以进程是分配资源的基本单位（而不是线程，因为同一个进程内的所有线程都可以访问这些资源）
2. 线程作为独立运行和独立调度的基本单位
线程是怎么运行和调度的呢。
先说运行，线程的运行状态是通过CPU寄存器表示的（不太严谨的说），几个比较重要的是：ip，表示下一条要执行的指令相对于当前段的偏移量，cs，表示当前代码段的信息，那么cs:ip就表示下一条指令的绝对地址，ss，表示堆栈的基地址，通过这个可以知道当前的调用栈是什么情况，sp、bp，表示当前栈帧的栈顶和栈基地址，所有临时变量都分配在栈上也就是这里，还有一大堆通用寄存器就不详细说了。通过这些寄存器我们可以知道1、当前运行到什么地方了，下一步要做什么；2、函数的调用栈，现在运行在哪个函数里面，这个函数又是在哪个函数里面，函数调用完成后应该返回到哪里等。线程的运行就是cpu一条一条的执行指令（ip一条一条的变化），随着函数的调用和返回，堆栈在增长和减少（sp、bp在变化）
再说调度，如果前面说的明白了，那么调度就是把某个线程对应的这套寄存器换出去，换另一套进来，这就是暂停一个线程，继续另一个线程的过程。
每一个线程对应着一套这样的寄存器（书上应该是叫xx控制块之类的），哪个线程的寄存器在cpu上，哪个线程就是在干活（运行），其他的就在候着，直到被换进来（调度），因此我们说运行和调度都是指线程，进程只是给线程提供了场地和资源

热心的回应 · 2019-6-1 20:17:21

首先，进程、线程的概念都是不断发展的，因此不同资料中的定义，常常会出现混乱。

首先在传统进程模型中，进程的内涵可分为下面两个方面：

调度、执行的基本单位：每个进程都有自己的运行状态（state）、优先级、寄存器等，是OS调度的基本单位。
资源所有权：包括程序（program text）、数据（data ）、文件（open file）等资源。一个进程拥有对这些资源的所有权，OS则提供保护功能，避免不同进程之间的资源冲突。

既然是两个独立的功能，可不可以把它们分离呢？这就出现了线程（thread）的概念：

执行与调度的基本单位：thread
资源所有权：process

在这种模型下，内核调度的基本单位是线程而非进程，进程只需要负责管理资源，这些资源则由同一进程下的线程共享。
线程的出现带来了以下便利：

创建、终止、切换thread的开销要比process小的多
由于共享地址空间，线程通信比进程通信高效得多

为什么说创建thread的开销小呢？因为同一进程下新建thread，只需要设置PC、（通用）寄存器、栈（注意也是per thread的！）以及线程状态（堵塞、等待…）即可，相较新建进程：

—— Modern Operating Systems

显然，进程相关的开销要大得多。

由于那时，操作系统内核没有提供对线程的支持，线程以用户级线程（user-level threads）的形式存在。在这种模型下，操作系统浑然不知线程的存在，仍然以进程为单位进行调度：

而随着现代Unix like、windows NT等系统内核对线程的支持，内核级线程（kernel-level thread）模型出现：

此外还有混合型实现，这里就不展开了。用户级线程和内核级线程各有千秋，很难用几句话概括，简单来说：

用户级线程相关开销小得多（不需要user mode到kernel mode的转换）
用户级线程堵塞会使同进程所有线程堵塞（一堵堵一堆），因为操作系统会把整个进程都堵塞掉。
同一进程下的各个用户级线程无法同时运行在多个cpu上（因为内核根本不知道有线程这东西）
内核级线程开销大（切换时需要user mode 到kernel mode，再回到user mode），但克服了上述用户级线程的2、3两个主要缺点。

总结一下，在经典的进程、线程模型中（支持kernel-level threads的现代 Unix）：

进程是资源的容器，包含（一个或）多个线程。
内核调度的基本单位是线程、而非进程。
同一进程下的各个线程共享资源（address space、open files、signal handlers，etc），但寄存器、栈、PC等不共享

基本上，教材上见到的都是上面这种模型。

-------------------------下面说一下Linux中令人困惑的的进程、线程----------------------------

warning：下面的内容不小心写多了，如果不是对linux特别感兴趣，可以点个赞离开了：）

1.在Linux中，内核用 task_struct 这一种数据结构代表传统意义上的进程、线程（everything is simply a runnable task），而不是为二者分别定义数据结构（如大多数操作系统）。single-threaded 进程使用一个task_struct表示，multi-threaded 进程就使用多个task_struct表示。

2.2000年，Linux引入了一个新的系统调用（system call）——clone，使进程与线程的边界变得模糊。简单的说，这个函数相比传统的fork，能够通过参数，使原本per process的资源变成per thread：

重新放一下上面的图帮助理解：

在Linux中，无论是进程还是线程，根本上都是通过clone一个已存在task（进程|线程）实现的，二者的差别就在于clone时Flags的传递，当CLONE_VM 位设置时（共享地址空间），创建线程；当该位未设置时，创建进程。

fork——进程创建，通过调用 0共享的clone 实现的
pthread_create——线程创建，则是通过调用最多共享的clone实现的。

3.在Linux中，共享内存的task可以被看做是同一进程下的不同线程。从内核的角度来讲，每一个task都具有一个PID(Process IDentifier)，注意这里，同一进程下的线程拥有不同的PID！。此外，每一个task还具有TGID（Task Group ID)，同一进程下的线程TGID相同，这个TGID即是我们通常意义下的PID。

（其实在计算机眼里只是个名字罢了，这个task的credentials更合适的叫法当然是TID，不过，PID这个概念早就深入人心了，so...)

小实验，使用

$ htop

复制代码

命令查看进程，结果如下：

重点是PID、TGID两列，再注意内存的使用量，发现了没？各项内存使用数据都相同，说明是同一进程下的线程，而他们的PID不同，TGID相同。

而POSIX标准规定，同一进程下的所有线程PID应相同，这也和我们通常理解的PID概念相符（毕竟是process ID！）。事实上，另外一个Linux常用的命令ps下，同一进程下的线程PID相同：

$ ps

(LWP stands for light weight process，轻量级进程，就是线程了)

没错！top、ps这两个Linux常用的命令，对PID这一在Linux下令人困惑的概念竟然不是不一致的：

查看二者的文档，果然如此：

ps 文档：

pid PID a number representing the process ID (alias tgid).

top文档：

PID The task's unique process ID, which periodically wraps, though never restarting at zero. In kernel terms, it is a dispatchable entity defined by a task_struct.

矛盾吗？不矛盾。其实不过是因为二者一个用的是用户视角（user view），一个用的是内核视角（kernel view）罢了。

（from stackoverflow）

4.Linux线程的实现

以内核2.6为界，Linux前后采用两种线程实现方式，由GNU C Library（glibc）提供：

LinuxThreads（glibc 2.4后不再支持）
NPTL (Native POSIX Threads Library)

在第一种实现方式下，存在许多与POSIX不兼容的特性，比如上面提到的同进程中的线程不共享PID的问题，而在NPTL中，这一问题得到了解决，当然也存在少量不兼容。

一些参考资料：
1. MODERN OPERATING SYSTEM，Andrew S.Tanenbaum
2.《操作系统—精髓与设计原理》，William Stallings，陈向群译
3. Linux man pages

4. http://stackoverflow.com/questions/807506/threads-vs-processes-in-linux
5. Linux - Threads and Process

热心的回应 · 2019-6-1 20:17:22

你的感觉是对的，就是没有根本性的区别，linux里面就是用一个结构task_struct同时实现进程和线程的，只是细节的调度域，等待等小的差别。本质都是要保存一个任务运行所用的状态，作为一个独立的可调度单位就行了，区别在于用户状态能是不是彻底共享的，其实进程也可以通过费劲的办法共享状态。
本质上，posix的接口要求有fork和pthread_create，这个只是接口要求的。其实区分这个意义不是太大，自己搞清楚哪里会出现race condition才更重要一些，多进程的程序也会有race condition。多线程的设计合理也可以和多进程的差不多。

热心的回应 · 2019-6-1 20:17:23

计算机操作系统里面有两个重要概念：并发和隔离。想想操作系统从最初很简单的功能发展到现在这么强大就容易理解这两个概念了。

   并发是为了尽量让硬件利用率高，不然计算机不可能按照摩尔定律不停发展，个人也不可能用上这么便宜的电脑。代码要并发执行不一定要有线程，线程只是一个操作系统上的概念，是为了让代码好写和好理解而已。很多单片机的操作系统其实没用线程的概念的，汇编语言也没线程的概念。就和程序设计里面引入设计模式差不多。线程相当于一个执行指针加了一些寄存器状态的结构体而已，有了线程概念，代码本身就只用考虑在哪个线程执行就可以了，而不用像没线程的系统里面要自己用汇编去保持寄存器状态，然后分配时间片等一堆事情。操作系统的一个作用就是简化上层应用的开发，线程这个概念的引入就做到了这一点。没有线程的世界，你可以想想要在8086那种cpu用汇编写个同时做多件事情的程序就知道了。最近还流行一个概念叫纤程，这个和线程很相似，只是纤程是为了在应用层面做到高并发，而线程是为了在系统层面做到并发。

   隔离也是并发之后要解决的重要问题，并发执行就表示有多段代码在跑，不能一段代码出错了整个系统就崩了，所以必须隔离。而隔离就涉及在多大范围内隔离了，因为代码是无状态的，有状态的是资源，这个隔离的范围一般和资源有关系。计算机的资源一般是共享的，隔离要能保障崩溃了这些资源能够被回收，不影响其他代码的使用，否则还不如蓝屏重启。所以说一个操作系统只有线程没有进程也是可以的，只是这样的系统会经常崩溃而已，操作系统刚开始发展的时候和这种情形很像。
那总结一下：线程和并发有关系，进程和隔离有关系。线程基本是为了代码并发执行引入的概念，因为要分配cpu时间片，暂停后再恢复要能够继续和没暂停一样继续执行；进程相当于一堆线程加上线程执行过程中申请的资源，一旦挂了，这些资源都要能回收，不影响其他程序。

   人类社会也和这个有点像，例如将地球资源和生产资料想象成硬件，人想象成线程，为了提高效率，最好的方式是流水线，分时占用资源，这样就是有人白天上班有人晚上上班，整个社会一直都在运动。但由于每个人都不一样，资源在全球也不均，这相当于每个程序不一样，有的程序耗cpu，有的程序io密集，而且程序一般是为了解决一个问题，这个相当于一群人有个共同的愿景，将一堆人和一堆资源组合在一起，相当于一个国家了，隔离的作用就是要保证一个国家崩溃不至于地球没了。进程通讯啥的就相当于跨国交易了。

   计算机的很多思想其实也是从现实中来的，联想一下当时的环境，就容易为什么出这个概念以及这个概念是为了解决什么。其实你在当时的环境下也会产生这两种概念的想法，只是牛逼的人能够抽象化了而已。这也是为什么将很多文章的引言当做历史书看，然后将你放在那个阶段去思考就容易理解了。

进程和线程之间有什么根本性的区别？

5 个回复