数据规整化:清理、转换、合并、重塑

论坛 期权论坛     
选择匿名的用户   2021-5-28 02:12   61   0
<div class="highlight highlight-source-python">
<pre class="blockcode"><span style="color:#d73a49">你也可以来看我的Github上的原文,欢迎交流:</span></pre>
<pre class="blockcode">https://github.com/AsuraDong/Blog/blob/master/Articles/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/%E6%95%B0%E6%8D%AE%E8%A7%84%E6%95%B4%E5%8C%96%EF%BC%9A%E6%B8%85%E7%90%86%E3%80%81%E8%BD%AC%E6%8D%A2%E3%80%81%E5%90%88%E5%B9%B6%E3%80%81%E9%87%8D%E5%A1%91.md</pre>
<pre class="blockcode"><span class="pl-k">import pandas <span class="pl-k">as pd
<span class="pl-k">import numpy <span class="pl-k">as np
<span class="pl-k">from pandas <span class="pl-k">import DataFrame <span class="pl-k">from pandas <span class="pl-k">import Series</span></span></span></span></span></span></span></span></pre>
</div>
<h2>1.合并数据集</h2>
<ul><li>pd.merge():各种参数的使用 </li></ul>
<div class="highlight highlight-source-python">
<pre class="blockcode">df1 <span class="pl-k">&#61; DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">&#39;key<span class="pl-pds">&#39;:[<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;c<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;],\ <span class="pl-s"><span class="pl-pds">&#39;data1<span class="pl-pds">&#39;:[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">7)]}) df2 <span class="pl-k">&#61; DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">&#39;key<span class="pl-pds">&#39;:[<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;d<span class="pl-pds">&#39;],\ <span class="pl-s"><span class="pl-pds">&#39;data2<span class="pl-pds">&#39;:[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">3)]}) <span class="pl-c1">print(df1)</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></pre>
</div>
<pre class="blockcode"><code>   data1 key
0      0   b
1      1   b
2      2   a
3      3   c
4      4   a
5      5   a
6      6   b
</code></pre>
<div class="highlight highlight-source-python">
<pre class="blockcode"><span class="pl-c1">print(df2)</span></pre>
</div>
<pre class="blockcode"><code>   data2 key
0      0   a
1      1   b
2      2   d
</code></pre>
<div class="highlight highlight-source-python">
<pre class="blockcode"><span class="pl-c1">print(pd.merge(df1,df2,<span class="pl-v">on<span class="pl-k">&#61;<span class="pl-s"><span class="pl-pds">&#39;key<span class="pl-pds">&#39;) ) <span class="pl-c"><span class="pl-c">#pd1 和 pd2 进行inner联结 <span class="pl-c"><span class="pl-c">#on :指明将列当做键。默认是重叠的。</span></span></span></span></span></span></span></span></span></span></pre>
</div>
<pre class="blockcode"><code>   data1 key  data2
0      0   b      1
1      1   b      1
2      6   b      1
3      2   a      0
4      4   a      0
5      5   a      0
</code></pre>
<div class="highlight highlight-source-python">
<pre class="blockcode">df3 <span class="pl-k">&#61; DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">&#39;key1<span class="pl-pds">&#39;:[<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;c<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;],\ <span class="pl-s"><span class="pl-pds">&#39;data1<span class="pl-pds">&#39;:[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">7)]}) df4 <span class="pl-k">&#61; DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">&#39;key2<span class="pl-pds">&#39;:[<span class="pl-s"><span class="pl-pds">&#39;a<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;b<span class="pl-pds">&#39;,<span class="pl-s"><span class="pl-pds">&#39;d<span class="pl-pds">&#39;],\ <span class="pl-s"><span class="pl-pds">&#39;data2<span class="pl-pds">&#39;:[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">3)]}) <span class="pl-c"><span class="pl-c"># 如果没有重叠的列名 <span class="pl-c"><span class="pl-c"># 分别指定 <span class="pl-c1">print(pd.merge(df3,df4,<span class="pl-v">left_on<span class="pl-k">&#61;<span class="pl-s"><span class="pl-pds">&#34;key1<span class
分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP