<div class="blogpost-body" id="cnblogs_post_body">
<p>数据集的合并或连接运算是通过一个或多个键将行链接起来的,而pandas的<span style="text-decoration:underline;"><strong>merge</strong></span>函数是对数据应用这些算法的主要切入点。</p>
<p><strong>一对多</strong>:df1的数据有多个被标记为a和b的行,而df2中key列的每个值则仅对应一行。</p>
<div class="cnblogs_code">
<pre class="blockcode">df1 = DataFrame({<!-- --><span style="color:#800000;">'</span><span style="color:#800000;">key</span><span style="color:#800000;">'</span>: [<span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">c</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span><span style="color:#000000;">],
</span><span style="color:#800000;">'</span><span style="color:#800000;">data1</span><span style="color:#800000;">'</span>: range(7<span style="color:#000000;">)})
df2 </span>= DataFrame({<!-- --><span style="color:#800000;">'</span><span style="color:#800000;">key</span><span style="color:#800000;">'</span>: [<span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">d</span><span style="color:#800000;">'</span><span style="color:#000000;">],
</span><span style="color:#800000;">'</span><span style="color:#800000;">data2</span><span style="color:#800000;">'</span>: range(3)})</pre>
</div>
<table border="0"><tbody><tr><td><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-aa4fd6bc9742647a1158bf6ab234561a.png"><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-5df46d7fd45f0006509ad6a5574ff6a2.png"></td></tr></tbody></table>
<p>注意:若没有指定哪个列进行连接,则默认将重叠列的列名当作键。</p>
<div class="cnblogs_code">
<pre class="blockcode"><span style="color:#000000;">pd.merge(df1, df2)
pd.merge(df1, df2, on</span>=<span style="color:#800000;">'</span><span style="color:#800000;">key</span><span style="color:#800000;">'</span>)</pre>
</div>
<table border="0"><tbody><tr><td><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-e1c5e291af493a1d0620b1027718087a.png"><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-516fa2487d66cbb31e451da79583e97a.png"></td></tr></tbody></table>
<p> 若两个对象的列名不同,也可以分别进行指定:</p>
<div class="cnblogs_code">
<pre class="blockcode">df3 = DataFrame({<!-- --><span style="color:#800000;">'</span><span style="color:#800000;">lkey</span><span style="color:#800000;">'</span>: [<span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">c</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span><span style="color:#000000;">],
</span><span style="color:#800000;">'</span><span style="color:#800000;">data1</span><span style="color:#800000;">'</span>: range(7<span style="color:#000000;">)})
df4 </span>= DataFrame({<!-- --><span style="color:#800000;">'</span><span style="color:#800000;">rkey</span><span style="color:#800000;">'</span>: [<span style="color:#800000;">'</span><span style="color:#800000;">a</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">b</span><span style="color:#800000;">'</span>, <span style="color:#800000;">'</span><span style="color:#800000;">d</span><span style="color:#800000;">'</span><span style="color:#000000;">],
</span><span style="color:#800000;">'</span><span style="color:#800000;">data2</span><span style="color:#800000;">'</span>: range(3<span s |
|