<div class="highlight highlight-source-python">
<pre class="blockcode"><span style="color:#d73a49">你也可以来看我的Github上的原文,欢迎交流:</span></pre>
<pre class="blockcode">https://github.com/AsuraDong/Blog/blob/master/Articles/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/%E6%95%B0%E6%8D%AE%E8%A7%84%E6%95%B4%E5%8C%96%EF%BC%9A%E6%B8%85%E7%90%86%E3%80%81%E8%BD%AC%E6%8D%A2%E3%80%81%E5%90%88%E5%B9%B6%E3%80%81%E9%87%8D%E5%A1%91.md</pre>
<pre class="blockcode"><span class="pl-k">import pandas <span class="pl-k">as pd
<span class="pl-k">import numpy <span class="pl-k">as np
<span class="pl-k">from pandas <span class="pl-k">import DataFrame <span class="pl-k">from pandas <span class="pl-k">import Series</span></span></span></span></span></span></span></span></pre>
</div>
<h2>1.合并数据集</h2>
<ul><li>pd.merge():各种参数的使用 </li></ul>
<div class="highlight highlight-source-python">
<pre class="blockcode">df1 <span class="pl-k">= DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">'key<span class="pl-pds">':[<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'c<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">'],\ <span class="pl-s"><span class="pl-pds">'data1<span class="pl-pds">':[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">7)]}) df2 <span class="pl-k">= DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">'key<span class="pl-pds">':[<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'d<span class="pl-pds">'],\ <span class="pl-s"><span class="pl-pds">'data2<span class="pl-pds">':[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">3)]}) <span class="pl-c1">print(df1)</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></pre>
</div>
<pre class="blockcode"><code> data1 key
0 0 b
1 1 b
2 2 a
3 3 c
4 4 a
5 5 a
6 6 b
</code></pre>
<div class="highlight highlight-source-python">
<pre class="blockcode"><span class="pl-c1">print(df2)</span></pre>
</div>
<pre class="blockcode"><code> data2 key
0 0 a
1 1 b
2 2 d
</code></pre>
<div class="highlight highlight-source-python">
<pre class="blockcode"><span class="pl-c1">print(pd.merge(df1,df2,<span class="pl-v">on<span class="pl-k">=<span class="pl-s"><span class="pl-pds">'key<span class="pl-pds">') ) <span class="pl-c"><span class="pl-c">#pd1 和 pd2 进行inner联结 <span class="pl-c"><span class="pl-c">#on :指明将列当做键。默认是重叠的。</span></span></span></span></span></span></span></span></span></span></pre>
</div>
<pre class="blockcode"><code> data1 key data2
0 0 b 1
1 1 b 1
2 6 b 1
3 2 a 0
4 4 a 0
5 5 a 0
</code></pre>
<div class="highlight highlight-source-python">
<pre class="blockcode">df3 <span class="pl-k">= DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">'key1<span class="pl-pds">':[<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'c<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">'],\ <span class="pl-s"><span class="pl-pds">'data1<span class="pl-pds">':[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">7)]}) df4 <span class="pl-k">= DataFrame({<!-- --><span class="pl-s"><span class="pl-pds">'key2<span class="pl-pds">':[<span class="pl-s"><span class="pl-pds">'a<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'b<span class="pl-pds">',<span class="pl-s"><span class="pl-pds">'d<span class="pl-pds">'],\ <span class="pl-s"><span class="pl-pds">'data2<span class="pl-pds">':[i <span class="pl-k">for i <span class="pl-k">in <span class="pl-c1">range(<span class="pl-c1">3)]}) <span class="pl-c"><span class="pl-c"># 如果没有重叠的列名 <span class="pl-c"><span class="pl-c"># 分别指定 <span class="pl-c1">print(pd.merge(df3,df4,<span class="pl-v">left_on<span class="pl-k">=<span class="pl-s"><span class="pl-pds">"key1<span class |
|