数据分析索引总结(中)Pandas多级索引

论坛 期权论坛     
选择匿名的用户   2021-5-28 02:15   28   0
<div id="js_content">
<p style="text-align: center"> Datawhale干货 <br></p>
<p style="text-align: center"><strong>作者:闫钟峰,</strong><strong>Datawhale优秀学习者</strong></p>
<p style="text-align: left">寄语:本文介绍了创建多级索引、多层索引切片、多层索引中的slice对象、索引层的交换等内容。</p>
<h2>创建多级索引</h2>
<h4>1. 通过from_tuple或from_arrays</h4>
<h4>① 直接从元组列表创建多重索引</h4>
<pre class="blockcode"><code class="language-nginx">tuples &#61; [(&#39;A&#39;,&#39;a&#39;),(&#39;A&#39;,&#39;b&#39;),(&#39;B&#39;,&#39;a&#39;),(&#39;B&#39;,&#39;b&#39;)]
mul_index &#61; pd.MultiIndex.from_tuples(tuples, names&#61;(&#39;Upper&#39;, &#39;Lower&#39;))
pd.DataFrame({&#39;Score&#39;:[&#39;perfect&#39;,&#39;good&#39;,&#39;fair&#39;,&#39;bad&#39;]},index&#61;mul_index
</code></pre>
<h4>② 利用zip创建元组</h4>
<pre class="blockcode"><code class="language-php">多重索引本质上的结构是一个由元组构成的list</code></pre>
<pre class="blockcode"><code class="language-makefile">L1 &#61; list(&#39;AABB&#39;)
L2 &#61; list(&#39;abab&#39;)
tuples &#61; list(zip(L1,L2))
mul_index &#61; pd.MultiIndex.from_tuples(tuples, names&#61;(&#39;Upper&#39;, &#39;Lower&#39;))
pd.DataFrame({&#39;Score&#39;:[&#39;perfect&#39;,&#39;good&#39;,&#39;fair&#39;,&#39;bad&#39;]},index&#61;mul_index)
</code></pre>
<pre class="blockcode"><code class="language-php">注意,如果用于创建多重索引的由tuple组成的list本身是未排序的, 那么创建的df也未排序。</code></pre>
<pre class="blockcode"><code class="language-go">pd.DataFrame({&#39;Score&#39;:[&#39;perfect&#39;,&#39;good&#39;,&#39;fair&#39;,&#39;bad&#39;]},index&#61;pd.MultiIndex.from_tuples(list(zip(L2,L1)), names&#61;(&#39;Lower&#39;, &#39;Upper&#39;)))
</code></pre>
<pre class="blockcode"><code class="language-php">为了便于使用, 可以使用sort_index()进行排序</code></pre>
<pre class="blockcode"><code class="language-php">pd.DataFrame({&#39;Score&#39;:[&#39;perfect&#39;,&#39;good&#39;,&#39;fair&#39;,&#39;bad&#39;]},index&#61;pd.MultiIndex.from_tuples(list(zip(L2,L1)), names&#61;(&#39;Lower&#39;, &#39;Upper&#39;))).sort_index()
</code></pre>
<h4>③ 通过Array(或列表构成的列表)创建</h4>
<h4>内层的list会自动转成元组</h4>
<pre class="blockcode"><code class="language-ini">arrays &#61; [[&#39;A&#39;,&#39;a&#39;],[&#39;A&#39;,&#39;b&#39;],[&#39;B&#39;,&#39;a&#39;],[&#39;B&#39;,&#39;b&#39;]]
mul_index &#61; pd.MultiIndex.from_tuples(arrays, names&#61;(&#39;Upper&#39;, &#39;Lower&#39;))
pd.DataFrame({&#39;Score&#39;:[&#39;perfect&#39;,&#39;good&#39;,&#39;fair&#39;,&#39;bad&#39;]},index&#61;mul_index)
</code></pre>
<h4>如果创建之初未排序,创建的多重索引也是未排序的</h4>
<pre class="blockcode"><code class="language-nginx">arrays &#61; [[&#39;A&#39;,&#39;a&#39;],[&#39;B&#39;,&#39;a&#39;],[&#39;A&#39;,&#39;b&#39;],[&#39;B&#39;,&#39;b&#39;]]
mul_index &#61; pd.MultiIndex.from_tuples(arrays, names&#61;(&#39;Upper&#39;, &#39;Lower&#39;))
pd.DataFrame({&#39;Score&#39;:[&#39;perfect&#39;,&#39;good&#39;,&#39;fair&#39;,&#39;bad&#39;]},index&#61;mul_index)
</code></pre>
<h4>尽管多重索引内部是个由tuple组成的list, 但由于顺序不同, 并不能视为相等的多重索引。但直接比较两个顺序不同的多重索引, 返回值是一个布尔值array, 并不如预期的那样。</h4>
<pre class="blockcode"><code class="language-ini">sorted_multi_index&#61;pd.DataFrame({&#39;Score&#39;:[&#39;perfect&#39;,&#39;good&#39;,&#39;fair&#39;,&#39;bad&#39;]},index&#61;mul_index).sort_index().index
sorted_multi_index&#61;&#61;mul_index
</code></pre>
<h4>如果是两个list, 改变顺序后与原始list相比较, 返回值只有一个 False。</h4>
<pre class="blockcode"><code class="language-javascript">[(&#39;A&#39;, &#39;a&#39;),  (&#39;B&#39;, &#39;a&#39;), (&#39;A&#39;, &#39;b&#39;), (&#39;B&#39;, &#39;b&#39;)]&#61;&#61;[(&#39;A&#39;, &#39;a&#39;),  (&#39;A&#39;, &#39;b&#39;), (&#39;B&#39;, &#39;a&#39;), (&#39;B&#39;, &#39;b&#39;)]
</code></pre>
<h4>创建一个由二元list构成的list</h4>
<pre class="blockcode"><code class="language-ini">arr&#61;np.random.randint(1,5,20).reshape(-1,2)
</code></pre>
<h4>必须把array转化为list才能用pd.MultiIndex.from_tuples 函数创建层次化索引。</h4>
<pre class="blockcode"><code class="language-php">pd.MultiIndex.from_tuples(list(arr),names&#61;(&#39;left&#39;,&#39;right&#39;))
</code></pre>
<h4>使用上述多重索引创建df后,要记得多加一个sort_index(), 以使得df的结果看起来更整齐。</h4>
<pre class="blockcode"><code class="language-php">dftemp&#61;pd.DataFrame(np.random.randn(20).reshape(10,2), index&#61;pd.MultiIndex.from_tuples(list(arr),names&#61;(&#39;left&#39;,&#39;right&#39;))).sort_index()
</code></pre>
<h4>多重索引构造器</h4>
<pre class="blockcode"><code class="language-apache">pd.MultiIndex.from_tuples??
# tuples: list / sequence of tuple-likes Each tuple is the index of one row/column.
</code></pre>
<pre class="blockcode"><code class="language-php">2. 通过from_product

</code></pre>
<h4>笛卡尔乘积---可能很多时候并不需要用笛卡儿积的所有结果作为索引。</h4>
<pre class="blockcode"><code class="language-nginx">L1 &#61; [&#39;A&#39;,&#39;B&#39;]
L2 &#61; [&#39;a&#39;,&#39;b&#39;]
pd.MultiIndex.from_product([L1,L2],names&#61;(&#39;Upper&#39;, &#39;Lower&
分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP