数据规整化——合并

论坛 期权论坛     
选择匿名的用户   2021-5-28 02:12   74   0
<div class="blogpost-body" id="cnblogs_post_body">
<p>数据集的合并或连接运算是通过一个或多个键将行链接起来的,而pandas的<span style="text-decoration:underline;"><strong>merge</strong></span>函数是对数据应用这些算法的主要切入点。</p>
<p><strong>一对多</strong>:df1的数据有多个被标记为a和b的行,而df2中key列的每个值则仅对应一行。</p>
<div class="cnblogs_code">
  <pre class="blockcode">df1 &#61; DataFrame({<!-- --><span style="color:#800000;">&#39;</span><span style="color:#800000;">key</span><span style="color:#800000;">&#39;</span>: [<span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">c</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span><span style="color:#000000;">],
                 </span><span style="color:#800000;">&#39;</span><span style="color:#800000;">data1</span><span style="color:#800000;">&#39;</span>: range(7<span style="color:#000000;">)})
df2 </span>&#61; DataFrame({<!-- --><span style="color:#800000;">&#39;</span><span style="color:#800000;">key</span><span style="color:#800000;">&#39;</span>: [<span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">d</span><span style="color:#800000;">&#39;</span><span style="color:#000000;">],
           </span><span style="color:#800000;">&#39;</span><span style="color:#800000;">data2</span><span style="color:#800000;">&#39;</span>: range(3)})</pre>
</div>
<table border="0"><tbody><tr><td><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-aa4fd6bc9742647a1158bf6ab234561a.png"><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-5df46d7fd45f0006509ad6a5574ff6a2.png"></td></tr></tbody></table>
<p>注意:若没有指定哪个列进行连接,则默认将重叠列的列名当作键。</p>
<div class="cnblogs_code">
  <pre class="blockcode"><span style="color:#000000;">pd.merge(df1, df2)
pd.merge(df1, df2, on</span>&#61;<span style="color:#800000;">&#39;</span><span style="color:#800000;">key</span><span style="color:#800000;">&#39;</span>)</pre>
</div>
<table border="0"><tbody><tr><td><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-e1c5e291af493a1d0620b1027718087a.png"><img alt="" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-516fa2487d66cbb31e451da79583e97a.png"></td></tr></tbody></table>
<p> 若两个对象的列名不同,也可以分别进行指定:</p>
<div class="cnblogs_code">
  <pre class="blockcode">df3 &#61; DataFrame({<!-- --><span style="color:#800000;">&#39;</span><span style="color:#800000;">lkey</span><span style="color:#800000;">&#39;</span>: [<span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">c</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span><span style="color:#000000;">],
                 </span><span style="color:#800000;">&#39;</span><span style="color:#800000;">data1</span><span style="color:#800000;">&#39;</span>: range(7<span style="color:#000000;">)})
df4 </span>&#61; DataFrame({<!-- --><span style="color:#800000;">&#39;</span><span style="color:#800000;">rkey</span><span style="color:#800000;">&#39;</span>: [<span style="color:#800000;">&#39;</span><span style="color:#800000;">a</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">b</span><span style="color:#800000;">&#39;</span>, <span style="color:#800000;">&#39;</span><span style="color:#800000;">d</span><span style="color:#800000;">&#39;</span><span style="color:#000000;">],
                 </span><span style="color:#800000;">&#39;</span><span style="color:#800000;">data2</span><span style="color:#800000;">&#39;</span>: range(3<span s
分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP