MultiIndex / 高度なインデックス処理#

このセクションでは、MultiIndex を使用したインデックス処理とその他の高度なインデックス処理機能について説明します。

一般的なインデックス処理のドキュメントについては、データのインデックス処理と選択を参照してください。

警告

設定操作でコピーが返されるか参照が返されるかは、コンテキストによって異なります。これはchained assignmentと呼ばれることもあり、避けるべきです。ビューを返すかコピーを返すかを参照してください。

高度な戦略については、クックブックを参照してください。

階層型インデックス処理 (MultiIndex)#

階層型/多段階インデックス処理は、特に高次元データを扱う上で、かなり高度なデータ分析と操作を可能にするため、非常に興味深いものです。本質的に、Series (1次元) や DataFrame (2次元) のような低次元データ構造に、任意の数の次元を持つデータを格納し、操作できるようにします。

このセクションでは、「階層型」インデックス処理が具体的に何を意味するのか、そしてそれが上記のセクションおよび以前のセクションで説明されているすべての pandas インデックス処理機能とどのように統合されるかを示します。後で、グループ化とデータのピボットと再整形について説明する際に、分析のためのデータ構造化にどのように役立つかを示す、自明ではないアプリケーションを紹介します。

高度な戦略については、クックブックを参照してください。

MultiIndex (階層型インデックス) オブジェクトの作成#

MultiIndex オブジェクトは、通常 pandas オブジェクトの軸ラベルを格納する標準の Index オブジェクトの階層的な類推物です。MultiIndex は、各タプルが一意であるタプルの配列と考えることができます。MultiIndex は、配列のリスト (MultiIndex.from_arrays() を使用)、タプルの配列 (MultiIndex.from_tuples() を使用)、イテラブルの交差集合 (MultiIndex.from_product() を使用)、または DataFrame (MultiIndex.from_frame() を使用) から作成できます。Index コンストラクタは、タプルのリストが渡された場合、MultiIndex を返そうとします。以下の例は、MultiIndex を初期化するさまざまな方法を示しています。

In [1]: arrays = [
   ...:     ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
   ...:     ["one", "two", "one", "two", "one", "two", "one", "two"],
   ...: ]
   ...: 

In [2]: tuples = list(zip(*arrays))

In [3]: tuples
Out[3]: 
[('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

In [4]: index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

In [5]: index
Out[5]: 
MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [6]: s = pd.Series(np.random.randn(8), index=index)

In [7]: s
Out[7]: 
first  second
bar    one       0.469112
       two      -0.282863
baz    one      -1.509059
       two      -1.135632
foo    one       1.212112
       two      -0.173215
qux    one       0.119209
       two      -1.044236
dtype: float64

2つのイテラブルの要素のすべてのペアリングが必要な場合は、MultiIndex.from_product() メソッドを使用する方が簡単です。

In [8]: iterables = [["bar", "baz", "foo", "qux"], ["one", "two"]]

In [9]: pd.MultiIndex.from_product(iterables, names=["first", "second"])
Out[9]: 
MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

また、MultiIndex.from_frame() メソッドを使用して、DataFrame から直接 MultiIndex を構築することもできます。これは MultiIndex.to_frame() と補完的なメソッドです。

In [10]: df = pd.DataFrame(
   ....:     [["bar", "one"], ["bar", "two"], ["foo", "one"], ["foo", "two"]],
   ....:     columns=["first", "second"],
   ....: )
   ....: 

In [11]: pd.MultiIndex.from_frame(df)
Out[11]: 
MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('foo', 'one'),
            ('foo', 'two')],
           names=['first', 'second'])

便宜上、配列のリストを Series または DataFrame に直接渡すことで、MultiIndex を自動的に構築できます。

In [12]: arrays = [
   ....:     np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
   ....:     np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
   ....: ]
   ....: 

In [13]: s = pd.Series(np.random.randn(8), index=arrays)

In [14]: s
Out[14]: 
bar  one   -0.861849
     two   -2.104569
baz  one   -0.494929
     two    1.071804
foo  one    0.721555
     two   -0.706771
qux  one   -1.039575
     two    0.271860
dtype: float64

In [15]: df = pd.DataFrame(np.random.randn(8, 4), index=arrays)

In [16]: df
Out[16]: 
                0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
foo one  1.075770 -0.109050  1.643563 -1.469388
    two  0.357021 -0.674600 -1.776904 -0.968914
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061

すべての MultiIndex コンストラクタは、レベル自体の文字列名を格納する names 引数を受け入れます。名前が指定されていない場合、None が割り当てられます。

In [17]: df.index.names
Out[17]: FrozenList([None, None])

このインデックスは pandas オブジェクトのどの軸にも対応でき、インデックスのレベルの数は任意です。

In [18]: df = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index)

In [19]: df
Out[19]: 
first        bar                 baz  ...       foo       qux          
second       one       two       one  ...       two       one       two
A       0.895717  0.805244 -1.206412  ...  1.340309 -1.170299 -0.226169
B       0.410835  0.813850  0.132003  ... -1.187678  1.130127 -1.436737
C      -1.413681  1.607920  1.024180  ... -2.211372  0.974466 -2.006747

[3 rows x 8 columns]

In [20]: pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
Out[20]: 
first              bar                 baz                 foo          
second             one       two       one       two       one       two
first second                                                            
bar   one    -0.410001 -0.078638  0.545952 -1.219217 -1.226825  0.769804
      two    -1.281247 -0.727707 -0.121306 -0.097883  0.695775  0.341734
baz   one     0.959726 -1.110336 -0.619976  0.149748 -0.732339  0.687738
      two     0.176444  0.403310 -0.154951  0.301624 -2.179861 -1.369849
foo   one    -0.954208  1.462696 -1.743161 -0.826591 -0.345352  1.314232
      two     0.690579  0.995761  2.396780  0.014871  3.357427 -0.317441

コンソール出力を少し見やすくするために、インデックスの上位レベルを「疎化」しました。インデックスの表示方法は、pandas.set_options() の multi_sparse オプションを使用して制御できることに注意してください。

In [21]: with pd.option_context("display.multi_sparse", False):
   ....:     df
   ....: 

軸上の原子ラベルとしてタプルを使用することを妨げるものは何もないことを覚えておく価値があります。

In [22]: pd.Series(np.random.randn(8), index=tuples)
Out[22]: 
(bar, one)   -1.236269
(bar, two)    0.896171
(baz, one)   -0.487602
(baz, two)   -0.082240
(foo, one)   -2.182937
(foo, two)    0.380396
(qux, one)    0.084844
(qux, two)    0.432390
dtype: float64

MultiIndex が重要なのは、以下およびドキュメントのその後の領域で説明するように、グループ化、選択、および再整形操作を実行できるためです。後のセクションでわかるように、明示的に MultiIndex を作成しなくても、階層的にインデックス付けされたデータを操作していることに気づくことがあります。ただし、ファイルからデータを読み込む場合、データセットを準備する際に独自の MultiIndex を生成したい場合があります。

レベルラベルの再構成#

get_level_values() メソッドは、特定のレベルの各場所のラベルのベクトルを返します。

In [23]: index.get_level_values(0)
Out[23]: Index(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], dtype='object', name='first')

In [24]: index.get_level_values("second")
Out[24]: Index(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'], dtype='object', name='second')

MultiIndex を持つ軸上の基本的なインデックス処理#

階層型インデックス処理の重要な機能の1つは、データのサブグループを識別する「部分的な」ラベルによってデータを選択できることです。部分的な選択は、通常の DataFrame で列を選択するのとまったく同じように、結果の階層型インデックスのレベルを「削除」します。

In [25]: df["bar"]
Out[25]: 
second       one       two
A       0.895717  0.805244
B       0.410835  0.813850
C      -1.413681  1.607920

In [26]: df["bar", "one"]
Out[26]: 
A    0.895717
B    0.410835
C   -1.413681
Name: (bar, one), dtype: float64

In [27]: df["bar"]["one"]
Out[27]: 
A    0.895717
B    0.410835
C   -1.413681
Name: one, dtype: float64

In [28]: s["qux"]
Out[28]: 
one   -1.039575
two    0.271860
dtype: float64

より深いレベルで選択する方法については、階層型インデックスによるクロスセクションを参照してください。

定義されたレベル#

MultiIndex は、定義されたインデックスのすべてのレベルを、実際に使用されていなくても保持します。インデックスをスライスするときに、このことに気づくかもしれません。たとえば、

In [29]: df.columns.levels  # original MultiIndex
Out[29]: FrozenList([['bar', 'baz', 'foo', 'qux'], ['one', 'two']])

In [30]: df[["foo","qux"]].columns.levels  # sliced
Out[30]: FrozenList([['bar', 'baz', 'foo', 'qux'], ['one', 'two']])

これは、スライス処理のパフォーマンスを向上させるために、レベルの再計算を避けるために行われます。使用されているレベルのみを表示したい場合は、get_level_values() メソッドを使用できます。

In [31]: df[["foo", "qux"]].columns.to_numpy()
Out[31]: 
array([('foo', 'one'), ('foo', 'two'), ('qux', 'one'), ('qux', 'two')],
      dtype=object)

# for a specific level
In [32]: df[["foo", "qux"]].columns.get_level_values(0)
Out[32]: Index(['foo', 'foo', 'qux', 'qux'], dtype='object', name='first')

使用されているレベルのみで MultiIndex を再構築するには、remove_unused_levels() メソッドを使用できます。

In [33]: new_mi = df[["foo", "qux"]].columns.remove_unused_levels()

In [34]: new_mi.levels
Out[34]: FrozenList([['foo', 'qux'], ['one', 'two']])

データアライメントと `reindex` の使用#

軸に MultiIndex を持つ異なるインデックス付けされたオブジェクト間の操作は、期待どおりに機能します。データアライメントは、タプルのインデックスと同じように機能します。

In [35]: s + s[:-2]
Out[35]: 
bar  one   -1.723698
     two   -4.209138
baz  one   -0.989859
     two    2.143608
foo  one    1.443110
     two   -1.413542
qux  one         NaN
     two         NaN
dtype: float64

In [36]: s + s[::2]
Out[36]: 
bar  one   -1.723698
     two         NaN
baz  one   -0.989859
     two         NaN
foo  one    1.443110
     two         NaN
qux  one   -2.079150
     two         NaN
dtype: float64

Series/DataFrames の reindex() メソッドは、別の MultiIndex、またはタプルのリストや配列で呼び出すことができます。

In [37]: s.reindex(index[:3])
Out[37]: 
first  second
bar    one      -0.861849
       two      -2.104569
baz    one      -0.494929
dtype: float64

In [38]: s.reindex([("foo", "two"), ("bar", "one"), ("qux", "one"), ("baz", "one")])
Out[38]: 
foo  two   -0.706771
bar  one   -0.861849
qux  one   -1.039575
baz  one   -0.494929
dtype: float64

階層型インデックスによる高度なインデックス処理#

.loc を使用した高度なインデックス処理に MultiIndex を構文的に統合するのは少し難しいですが、私たちはそうするためにあらゆる努力をしました。一般的に、MultiIndex のキーはタプルの形式を取ります。例えば、以下のコードは期待どおりに機能します。

In [39]: df = df.T

In [40]: df
Out[40]: 
                     A         B         C
first second                              
bar   one     0.895717  0.410835 -1.413681
      two     0.805244  0.813850  1.607920
baz   one    -1.206412  0.132003  1.024180
      two     2.565646 -0.827317  0.569605
foo   one     1.431256 -0.076467  0.875906
      two     1.340309 -1.187678 -2.211372
qux   one    -1.170299  1.130127  0.974466
      two    -0.226169 -1.436737 -2.006747

In [41]: df.loc[("bar", "two")]
Out[41]: 
A    0.805244
B    0.813850
C    1.607920
Name: (bar, two), dtype: float64

この例では df.loc['bar', 'two'] も機能しますが、この短縮表記は一般的にあいまいさを生じる可能性があります。

また、.loc を使用して特定の列をインデックス付けしたい場合は、次のようにタプルを使用する必要があります。

In [42]: df.loc[("bar", "two"), "A"]
Out[42]: 0.8052440253863785

タプルの最初の要素のみを渡すことで、MultiIndex のすべてのレベルを指定する必要はありません。たとえば、次のように「部分的な」インデックス処理を使用して、最初のレベルに bar を持つすべての要素を取得できます。

In [43]: df.loc["bar"]
Out[43]: 
               A         B         C
second                              
one     0.895717  0.410835 -1.413681
two     0.805244  0.813850  1.607920

これは、少し冗長な表記 df.loc[('bar',),] (この例では df.loc['bar',] と同等) のショートカットです。

「部分的な」スライスも非常にうまく機能します。

In [44]: df.loc["baz":"foo"]
Out[44]: 
                     A         B         C
first second                              
baz   one    -1.206412  0.132003  1.024180
      two     2.565646 -0.827317  0.569605
foo   one     1.431256 -0.076467  0.875906
      two     1.340309 -1.187678 -2.211372

タプルのスライスを提供することで、「範囲」の値でスライスできます。

In [45]: df.loc[("baz", "two"):("qux", "one")]
Out[45]: 
                     A         B         C
first second                              
baz   two     2.565646 -0.827317  0.569605
foo   one     1.431256 -0.076467  0.875906
      two     1.340309 -1.187678 -2.211372
qux   one    -1.170299  1.130127  0.974466

In [46]: df.loc[("baz", "two"):"foo"]
Out[46]: 
                     A         B         C
first second                              
baz   two     2.565646 -0.827317  0.569605
foo   one     1.431256 -0.076467  0.875906
      two     1.340309 -1.187678 -2.211372

ラベルまたはタプルのリストを渡すのは、再インデックス処理と似ています。

In [47]: df.loc[[("bar", "two"), ("qux", "one")]]
Out[47]: 
                     A         B         C
first second                              
bar   two     0.805244  0.813850  1.607920
qux   one    -1.170299  1.130127  0.974466

注

インデックス処理に関して、pandas ではタプルとリストが同じように扱われないことに注意することが重要です。タプルは1つの多レベルキーとして解釈されるのに対し、リストは複数のキーを指定するために使用されます。言い換えれば、タプルは水平方向 (レベルを横断) に、リストは垂直方向 (レベルをスキャン) に進みます。

重要なことに、タプルのリストは複数の完全な MultiIndex キーをインデックス処理しますが、リストのタプルはレベル内の複数の値を参照します。

In [48]: s = pd.Series(
   ....:     [1, 2, 3, 4, 5, 6],
   ....:     index=pd.MultiIndex.from_product([["A", "B"], ["c", "d", "e"]]),
   ....: )
   ....: 

In [49]: s.loc[[("A", "c"), ("B", "d")]]  # list of tuples
Out[49]: 
A  c    1
B  d    5
dtype: int64

In [50]: s.loc[(["A", "B"], ["c", "d"])]  # tuple of lists
Out[50]: 
A  c    1
   d    2
B  c    4
   d    5
dtype: int64

スライサーの使用#

複数のインデクサーを提供することで、MultiIndex をスライスできます。

スライス、ラベルのリスト、ラベル、ブールインデクサーなど、ラベルでインデックス処理する場合と同様に、任意のセレクターを指定できます。ラベルによる選択を参照してください。

slice(None) を使用して、そのレベルのすべての内容を選択できます。より深いレベルをすべて指定する必要はありません。それらは slice(None) として暗示されます。

通常通り、これはラベルインデックス処理であるため、スライサーの両側が含まれます。

警告

.loc 指定子では、インデックスと列の両方の軸を指定する必要があります。渡されたインデクサーが、行の MultiIndex ではなく、両方の軸をインデックス処理していると誤解される可能性があるあいまいなケースがいくつかあります。

これを行う必要があります。

df.loc[(slice("A1", "A3"), ...), :]  # noqa: E999

これを行うべきではありません。

df.loc[(slice("A1", "A3"), ...)]  # noqa: E999

In [51]: def mklbl(prefix, n):
   ....:     return ["%s%s" % (prefix, i) for i in range(n)]
   ....: 

In [52]: miindex = pd.MultiIndex.from_product(
   ....:     [mklbl("A", 4), mklbl("B", 2), mklbl("C", 4), mklbl("D", 2)]
   ....: )
   ....: 

In [53]: micolumns = pd.MultiIndex.from_tuples(
   ....:     [("a", "foo"), ("a", "bar"), ("b", "foo"), ("b", "bah")], names=["lvl0", "lvl1"]
   ....: )
   ....: 

In [54]: dfmi = (
   ....:     pd.DataFrame(
   ....:         np.arange(len(miindex) * len(micolumns)).reshape(
   ....:             (len(miindex), len(micolumns))
   ....:         ),
   ....:         index=miindex,
   ....:         columns=micolumns,
   ....:     )
   ....:     .sort_index()
   ....:     .sort_index(axis=1)
   ....: )
   ....: 

In [55]: dfmi
Out[55]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A0 B0 C0 D0    1    0    3    2
         D1    5    4    7    6
      C1 D0    9    8   11   10
         D1   13   12   15   14
      C2 D0   17   16   19   18
...          ...  ...  ...  ...
A3 B1 C1 D1  237  236  239  238
      C2 D0  241  240  243  242
         D1  245  244  247  246
      C3 D0  249  248  251  250
         D1  253  252  255  254

[64 rows x 4 columns]

スライス、リスト、ラベルを使用した基本的なMultiIndexスライス。

In [56]: dfmi.loc[(slice("A1", "A3"), slice(None), ["C1", "C3"]), :]
Out[56]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A1 B0 C1 D0   73   72   75   74
         D1   77   76   79   78
      C3 D0   89   88   91   90
         D1   93   92   95   94
   B1 C1 D0  105  104  107  106
...          ...  ...  ...  ...
A3 B0 C3 D1  221  220  223  222
   B1 C1 D0  233  232  235  234
         D1  237  236  239  238
      C3 D0  249  248  251  250
         D1  253  252  255  254

[24 rows x 4 columns]

slice(None) を使用する代わりに、pandas.IndexSlice を使用して、: を使用したより自然な構文を容易にすることができます。

In [57]: idx = pd.IndexSlice

In [58]: dfmi.loc[idx[:, :, ["C1", "C3"]], idx[:, "foo"]]
Out[58]: 
lvl0           a    b
lvl1         foo  foo
A0 B0 C1 D0    8   10
         D1   12   14
      C3 D0   24   26
         D1   28   30
   B1 C1 D0   40   42
...          ...  ...
A3 B0 C3 D1  220  222
   B1 C1 D0  232  234
         D1  236  238
      C3 D0  248  250
         D1  252  254

[32 rows x 2 columns]

この方法を使用すると、複数の軸に対して同時に非常に複雑な選択を実行できます。

In [59]: dfmi.loc["A1", (slice(None), "foo")]
Out[59]: 
lvl0        a    b
lvl1      foo  foo
B0 C0 D0   64   66
      D1   68   70
   C1 D0   72   74
      D1   76   78
   C2 D0   80   82
...       ...  ...
B1 C1 D1  108  110
   C2 D0  112  114
      D1  116  118
   C3 D0  120  122
      D1  124  126

[16 rows x 2 columns]

In [60]: dfmi.loc[idx[:, :, ["C1", "C3"]], idx[:, "foo"]]
Out[60]: 
lvl0           a    b
lvl1         foo  foo
A0 B0 C1 D0    8   10
         D1   12   14
      C3 D0   24   26
         D1   28   30
   B1 C1 D0   40   42
...          ...  ...
A3 B0 C3 D1  220  222
   B1 C1 D0  232  234
         D1  236  238
      C3 D0  248  250
         D1  252  254

[32 rows x 2 columns]

ブールインデクサーを使用すると、値に関連する選択を提供できます。

In [61]: mask = dfmi[("a", "foo")] > 200

In [62]: dfmi.loc[idx[mask, :, ["C1", "C3"]], idx[:, "foo"]]
Out[62]: 
lvl0           a    b
lvl1         foo  foo
A3 B0 C1 D1  204  206
      C3 D0  216  218
         D1  220  222
   B1 C1 D0  232  234
         D1  236  238
      C3 D0  248  250
         D1  252  254

また、.loc の axis 引数を指定して、渡されたスライサーを単一の軸で解釈することもできます。

In [63]: dfmi.loc(axis=0)[:, :, ["C1", "C3"]]
Out[63]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A0 B0 C1 D0    9    8   11   10
         D1   13   12   15   14
      C3 D0   25   24   27   26
         D1   29   28   31   30
   B1 C1 D0   41   40   43   42
...          ...  ...  ...  ...
A3 B0 C3 D1  221  220  223  222
   B1 C1 D0  233  232  235  234
         D1  237  236  239  238
      C3 D0  249  248  251  250
         D1  253  252  255  254

[32 rows x 4 columns]

さらに、以下のメソッドを使用して値を設定できます。

In [64]: df2 = dfmi.copy()

In [65]: df2.loc(axis=0)[:, :, ["C1", "C3"]] = -10

In [66]: df2
Out[66]: 
lvl0           a         b     
lvl1         bar  foo  bah  foo
A0 B0 C0 D0    1    0    3    2
         D1    5    4    7    6
      C1 D0  -10  -10  -10  -10
         D1  -10  -10  -10  -10
      C2 D0   17   16   19   18
...          ...  ...  ...  ...
A3 B1 C1 D1  -10  -10  -10  -10
      C2 D0  241  240  243  242
         D1  245  244  247  246
      C3 D0  -10  -10  -10  -10
         D1  -10  -10  -10  -10

[64 rows x 4 columns]

アライメント可能なオブジェクトの右辺も使用できます。

In [67]: df2 = dfmi.copy()

In [68]: df2.loc[idx[:, :, ["C1", "C3"]], :] = df2 * 1000

In [69]: df2
Out[69]: 
lvl0              a               b        
lvl1            bar     foo     bah     foo
A0 B0 C0 D0       1       0       3       2
         D1       5       4       7       6
      C1 D0    9000    8000   11000   10000
         D1   13000   12000   15000   14000
      C2 D0      17      16      19      18
...             ...     ...     ...     ...
A3 B1 C1 D1  237000  236000  239000  238000
      C2 D0     241     240     243     242
         D1     245     244     247     246
      C3 D0  249000  248000  251000  250000
         D1  253000  252000  255000  254000

[64 rows x 4 columns]

クロスセクション#

DataFrame の xs() メソッドは、MultiIndex の特定のレベルでデータを簡単に選択できるように、レベル引数を追加で受け取ります。

In [70]: df
Out[70]: 
                     A         B         C
first second                              
bar   one     0.895717  0.410835 -1.413681
      two     0.805244  0.813850  1.607920
baz   one    -1.206412  0.132003  1.024180
      two     2.565646 -0.827317  0.569605
foo   one     1.431256 -0.076467  0.875906
      two     1.340309 -1.187678 -2.211372
qux   one    -1.170299  1.130127  0.974466
      two    -0.226169 -1.436737 -2.006747

In [71]: df.xs("one", level="second")
Out[71]: 
              A         B         C
first                              
bar    0.895717  0.410835 -1.413681
baz   -1.206412  0.132003  1.024180
foo    1.431256 -0.076467  0.875906
qux   -1.170299  1.130127  0.974466

# using the slicers
In [72]: df.loc[(slice(None), "one"), :]
Out[72]: 
                     A         B         C
first second                              
bar   one     0.895717  0.410835 -1.413681
baz   one    -1.206412  0.132003  1.024180
foo   one     1.431256 -0.076467  0.875906
qux   one    -1.170299  1.130127  0.974466

xs を使用して列を選択することもでき、axis引数を指定します。

In [73]: df = df.T

In [74]: df.xs("one", level="second", axis=1)
Out[74]: 
first       bar       baz       foo       qux
A      0.895717 -1.206412  1.431256 -1.170299
B      0.410835  0.132003 -0.076467  1.130127
C     -1.413681  1.024180  0.875906  0.974466

# using the slicers
In [75]: df.loc[:, (slice(None), "one")]
Out[75]: 
first        bar       baz       foo       qux
second       one       one       one       one
A       0.895717 -1.206412  1.431256 -1.170299
B       0.410835  0.132003 -0.076467  1.130127
C      -1.413681  1.024180  0.875906  0.974466

xs は複数のキーによる選択も可能です。

In [76]: df.xs(("one", "bar"), level=("second", "first"), axis=1)
Out[76]: 
first        bar
second       one
A       0.895717
B       0.410835
C      -1.413681

# using the slicers
In [77]: df.loc[:, ("bar", "one")]
Out[77]: 
A    0.895717
B    0.410835
C   -1.413681
Name: (bar, one), dtype: float64

drop_level=False を xs に渡すと、選択したレベルを保持できます。

In [78]: df.xs("one", level="second", axis=1, drop_level=False)
Out[78]: 
first        bar       baz       foo       qux
second       one       one       one       one
A       0.895717 -1.206412  1.431256 -1.170299
B       0.410835  0.132003 -0.076467  1.130127
C      -1.413681  1.024180  0.875906  0.974466

上記を drop_level=True (デフォルト値) を使用した結果と比較してください。

In [79]: df.xs("one", level="second", axis=1, drop_level=True)
Out[79]: 
first       bar       baz       foo       qux
A      0.895717 -1.206412  1.431256 -1.170299
B      0.410835  0.132003 -0.076467  1.130127
C     -1.413681  1.024180  0.875906  0.974466

高度な再インデックス処理とアライメント#

pandas オブジェクトの reindex() および align() メソッドでパラメータ level を使用することは、レベル全体に値をブロードキャストするのに役立ちます。例えば、

In [80]: midx = pd.MultiIndex(
   ....:     levels=[["zero", "one"], ["x", "y"]], codes=[[1, 1, 0, 0], [1, 0, 1, 0]]
   ....: )
   ....: 

In [81]: df = pd.DataFrame(np.random.randn(4, 2), index=midx)

In [82]: df
Out[82]: 
               0         1
one  y  1.519970 -0.493662
     x  0.600178  0.274230
zero y  0.132885 -0.023688
     x  2.410179  1.450520

In [83]: df2 = df.groupby(level=0).mean()

In [84]: df2
Out[84]: 
             0         1
one   1.060074 -0.109716
zero  1.271532  0.713416

In [85]: df2.reindex(df.index, level=0)
Out[85]: 
               0         1
one  y  1.060074 -0.109716
     x  1.060074 -0.109716
zero y  1.271532  0.713416
     x  1.271532  0.713416

# aligning
In [86]: df_aligned, df2_aligned = df.align(df2, level=0)

In [87]: df_aligned
Out[87]: 
               0         1
one  y  1.519970 -0.493662
     x  0.600178  0.274230
zero y  0.132885 -0.023688
     x  2.410179  1.450520

In [88]: df2_aligned
Out[88]: 
               0         1
one  y  1.060074 -0.109716
     x  1.060074 -0.109716
zero y  1.271532  0.713416
     x  1.271532  0.713416

`swaplevel` を使用したレベルの入れ替え#

swaplevel() メソッドは、2つのレベルの順序を入れ替えることができます。

In [89]: df[:5]
Out[89]: 
               0         1
one  y  1.519970 -0.493662
     x  0.600178  0.274230
zero y  0.132885 -0.023688
     x  2.410179  1.450520

In [90]: df[:5].swaplevel(0, 1, axis=0)
Out[90]: 
               0         1
y one   1.519970 -0.493662
x one   0.600178  0.274230
y zero  0.132885 -0.023688
x zero  2.410179  1.450520

`reorder_levels` を使用したレベルの並べ替え#

reorder_levels() メソッドは swaplevel メソッドを一般化し、1ステップで階層型インデックスレベルを置換できるようにします。

In [91]: df[:5].reorder_levels([1, 0], axis=0)
Out[91]: 
               0         1
y one   1.519970 -0.493662
x one   0.600178  0.274230
y zero  0.132885 -0.023688
x zero  2.410179  1.450520

`Index` または `MultiIndex` の名前の変更#

rename() メソッドは MultiIndex のラベルの名前を変更するために使用され、通常は DataFrame の列の名前を変更するために使用されます。rename の columns 引数には、名前を変更したい列のみを含む辞書を指定できます。

In [92]: df.rename(columns={0: "col0", 1: "col1"})
Out[92]: 
            col0      col1
one  y  1.519970 -0.493662
     x  0.600178  0.274230
zero y  0.132885 -0.023688
     x  2.410179  1.450520

このメソッドは、DataFrame のメインインデックスの特定のラベルの名前を変更するためにも使用できます。

In [93]: df.rename(index={"one": "two", "y": "z"})
Out[93]: 
               0         1
two  z  1.519970 -0.493662
     x  0.600178  0.274230
zero z  0.132885 -0.023688
     x  2.410179  1.450520

rename_axis() メソッドは、Index または MultiIndex の名前を変更するために使用されます。特に、MultiIndex のレベルの名前を指定でき、これは後で reset_index() を使用して MultiIndex の値を列に移動する場合に便利です。

In [94]: df.rename_axis(index=["abc", "def"])
Out[94]: 
                 0         1
abc  def                    
one  y    1.519970 -0.493662
     x    0.600178  0.274230
zero y    0.132885 -0.023688
     x    2.410179  1.450520

DataFrame の列はインデックスであるため、columns 引数とともに rename_axis を使用すると、そのインデックスの名前が変更されることに注意してください。

In [95]: df.rename_axis(columns="Cols").columns
Out[95]: RangeIndex(start=0, stop=2, step=1, name='Cols')

rename と rename_axis の両方で、ラベル/名前を新しい値にマッピングするための辞書、Series、またはマッピング関数を指定できます。

DataFrame を介さずに Index オブジェクトを直接操作する場合、Index.set_names() を使用して名前を変更できます。

In [96]: mi = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=["x", "y"])

In [97]: mi.names
Out[97]: FrozenList(['x', 'y'])

In [98]: mi2 = mi.rename("new name", level=0)

In [99]: mi2
Out[99]: 
MultiIndex([(1, 'a'),
            (1, 'b'),
            (2, 'a'),
            (2, 'b')],
           names=['new name', 'y'])

MultiIndex の名前をレベル経由で設定することはできません。

In [100]: mi.levels[0].name = "name via level"
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[100], line 1
----> 1 mi.levels[0].name = "name via level"

File ~/work/pandas/pandas/pandas/core/indexes/base.py:1697, in Index.name(self, value)
   1693 @name.setter
   1694 def name(self, value: Hashable) -> None:
   1695     if self._no_setting_name:
   1696         # Used in MultiIndex.levels to avoid silently ignoring name updates.
-> 1697         raise RuntimeError(
   1698             "Cannot set name on a level of a MultiIndex. Use "
   1699             "'MultiIndex.set_names' instead."
   1700         )
   1701     maybe_extract_name(value, None, type(self))
   1702     self._name = value

RuntimeError: Cannot set name on a level of a MultiIndex. Use 'MultiIndex.set_names' instead.

代わりに Index.set_names() を使用してください。

`MultiIndex` のソート#

MultiIndex でインデックス付けされたオブジェクトを効率的にインデックス付けおよびスライスするには、ソートする必要があります。任意のインデックスと同様に、sort_index() を使用できます。

In [101]: import random

In [102]: random.shuffle(tuples)

In [103]: s = pd.Series(np.random.randn(8), index=pd.MultiIndex.from_tuples(tuples))

In [104]: s
Out[104]: 
baz  two    0.206053
foo  two   -0.251905
bar  one   -2.213588
     two    1.063327
baz  one    1.266143
foo  one    0.299368
qux  one   -0.863838
     two    0.408204
dtype: float64

In [105]: s.sort_index()
Out[105]: 
bar  one   -2.213588
     two    1.063327
baz  one    1.266143
     two    0.206053
foo  one    0.299368
     two   -0.251905
qux  one   -0.863838
     two    0.408204
dtype: float64

In [106]: s.sort_index(level=0)
Out[106]: 
bar  one   -2.213588
     two    1.063327
baz  one    1.266143
     two    0.206053
foo  one    0.299368
     two   -0.251905
qux  one   -0.863838
     two    0.408204
dtype: float64

In [107]: s.sort_index(level=1)
Out[107]: 
bar  one   -2.213588
baz  one    1.266143
foo  one    0.299368
qux  one   -0.863838
bar  two    1.063327
baz  two    0.206053
foo  two   -0.251905
qux  two    0.408204
dtype: float64

MultiIndex レベルに名前が付けられている場合、sort_index にレベル名を渡すこともできます。

In [108]: s.index = s.index.set_names(["L1", "L2"])

In [109]: s.sort_index(level="L1")
Out[109]: 
L1   L2 
bar  one   -2.213588
     two    1.063327
baz  one    1.266143
     two    0.206053
foo  one    0.299368
     two   -0.251905
qux  one   -0.863838
     two    0.408204
dtype: float64

In [110]: s.sort_index(level="L2")
Out[110]: 
L1   L2 
bar  one   -2.213588
baz  one    1.266143
foo  one    0.299368
qux  one   -0.863838
bar  two    1.063327
baz  two    0.206053
foo  two   -0.251905
qux  two    0.408204
dtype: float64

高次元オブジェクトでは、MultiIndex を持つ場合、他の軸をレベルでソートできます。

In [111]: df.T.sort_index(level=1, axis=1)
Out[111]: 
        one      zero       one      zero
          x         x         y         y
0  0.600178  2.410179  1.519970  0.132885
1  0.274230  1.450520 -0.493662 -0.023688

データがソートされていなくてもインデックス処理は機能しますが、かなり非効率的になり (PerformanceWarning が表示されます)、ビューではなくデータのコピーを返します。

In [112]: dfm = pd.DataFrame(
   .....:     {"jim": [0, 0, 1, 1], "joe": ["x", "x", "z", "y"], "jolie": np.random.rand(4)}
   .....: )
   .....: 

In [113]: dfm = dfm.set_index(["jim", "joe"])

In [114]: dfm
Out[114]: 
            jolie
jim joe          
0   x    0.490671
    x    0.120248
1   z    0.537020
    y    0.110968

In [115]: dfm.loc[(1, 'z')]
Out[115]: 
           jolie
jim joe         
1   z    0.53702

さらに、完全に字句順にソートされていないものをインデックス処理しようとすると、例外が発生する可能性があります。

In [116]: dfm.loc[(0, 'y'):(1, 'z')]
---------------------------------------------------------------------------
UnsortedIndexError                        Traceback (most recent call last)
Cell In[116], line 1
----> 1 dfm.loc[(0, 'y'):(1, 'z')]

File ~/work/pandas/pandas/pandas/core/indexing.py:1191, in _LocationIndexer.__getitem__(self, key)
   1189 maybe_callable = com.apply_if_callable(key, self.obj)
   1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable)
-> 1191 return self._getitem_axis(maybe_callable, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexing.py:1411, in _LocIndexer._getitem_axis(self, key, axis)
   1409 if isinstance(key, slice):
   1410     self._validate_key(key, axis)
-> 1411     return self._get_slice_axis(key, axis=axis)
   1412 elif com.is_bool_indexer(key):
   1413     return self._getbool_axis(key, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexing.py:1443, in _LocIndexer._get_slice_axis(self, slice_obj, axis)
   1440     return obj.copy(deep=False)
   1442 labels = obj._get_axis(axis)
-> 1443 indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop, slice_obj.step)
   1445 if isinstance(indexer, slice):
   1446     return self.obj._slice(indexer, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6678, in Index.slice_indexer(self, start, end, step)
   6634 def slice_indexer(
   6635     self,
   6636     start: Hashable | None = None,
   6637     end: Hashable | None = None,
   6638     step: int | None = None,
   6639 ) -> slice:
   6640     """
   6641     Compute the slice indexer for input labels and step.
   6642 
   (...)
   6676     slice(1, 3, None)
   6677     """
-> 6678     start_slice, end_slice = self.slice_locs(start, end, step=step)
   6680     # return a slice
   6681     if not is_scalar(start_slice):

File ~/work/pandas/pandas/pandas/core/indexes/multi.py:2923, in MultiIndex.slice_locs(self, start, end, step)
   2871 """
   2872 For an ordered MultiIndex, compute the slice locations for input
   2873 labels.
   (...)
   2919                       sequence of such.
   2920 """
   2921 # This function adds nothing to its parent implementation (the magic
   2922 # happens in get_slice_bound method), but it adds meaningful doc.
-> 2923 return super().slice_locs(start, end, step)

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6904, in Index.slice_locs(self, start, end, step)
   6902 start_slice = None
   6903 if start is not None:
-> 6904     start_slice = self.get_slice_bound(start, "left")
   6905 if start_slice is None:
   6906     start_slice = 0

File ~/work/pandas/pandas/pandas/core/indexes/multi.py:2867, in MultiIndex.get_slice_bound(self, label, side)
   2865 if not isinstance(label, tuple):
   2866     label = (label,)
-> 2867 return self._partial_tup_index(label, side=side)

File ~/work/pandas/pandas/pandas/core/indexes/multi.py:2927, in MultiIndex._partial_tup_index(self, tup, side)
   2925 def _partial_tup_index(self, tup: tuple, side: Literal["left", "right"] = "left"):
   2926     if len(tup) > self._lexsort_depth:
-> 2927         raise UnsortedIndexError(
   2928             f"Key length ({len(tup)}) was greater than MultiIndex lexsort depth "
   2929             f"({self._lexsort_depth})"
   2930         )
   2932     n = len(tup)
   2933     start, end = 0, len(self)

UnsortedIndexError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'

MultiIndex の is_monotonic_increasing() メソッドは、インデックスがソートされているかどうかを示します。

In [117]: dfm.index.is_monotonic_increasing
Out[117]: False

In [118]: dfm = dfm.sort_index()

In [119]: dfm
Out[119]: 
            jolie
jim joe          
0   x    0.490671
    x    0.120248
1   y    0.110968
    z    0.537020

In [120]: dfm.index.is_monotonic_increasing
Out[120]: True

そして、選択は期待どおりに機能します。

In [121]: dfm.loc[(0, "y"):(1, "z")]
Out[121]: 
            jolie
jim joe          
1   y    0.110968
    z    0.537020

Take メソッド#

NumPy ndarray と同様に、pandas の Index、Series、および DataFrame も、指定された軸上の指定されたインデックス位置の要素を取得する take() メソッドを提供します。指定されたインデックスは、整数のインデックス位置のリストまたは ndarray である必要があります。take は、オブジェクトの末尾からの相対位置として負の整数も受け入れます。

In [122]: index = pd.Index(np.random.randint(0, 1000, 10))

In [123]: index
Out[123]: Index([214, 502, 712, 567, 786, 175, 993, 133, 758, 329], dtype='int64')

In [124]: positions = [0, 9, 3]

In [125]: index[positions]
Out[125]: Index([214, 329, 567], dtype='int64')

In [126]: index.take(positions)
Out[126]: Index([214, 329, 567], dtype='int64')

In [127]: ser = pd.Series(np.random.randn(10))

In [128]: ser.iloc[positions]
Out[128]: 
0   -0.179666
9    1.824375
3    0.392149
dtype: float64

In [129]: ser.take(positions)
Out[129]: 
0   -0.179666
9    1.824375
3    0.392149
dtype: float64

DataFrame の場合、指定されたインデックスは、行または列の位置を指定する1次元リストまたはndarrayである必要があります。

In [130]: frm = pd.DataFrame(np.random.randn(5, 3))

In [131]: frm.take([1, 4, 3])
Out[131]: 
          0         1         2
1 -1.237881  0.106854 -1.276829
4  0.629675 -1.425966  1.857704
3  0.979542 -1.633678  0.615855

In [132]: frm.take([0, 2], axis=1)
Out[132]: 
          0         2
0  0.595974  0.601544
1 -1.237881 -1.276829
2 -0.767101  1.499591
3  0.979542  0.615855
4  0.629675  1.857704

pandas オブジェクトの take メソッドはブールインデックスでは機能することを意図しておらず、予期しない結果を返す可能性があることに注意することが重要です。

In [133]: arr = np.random.randn(10)

In [134]: arr.take([False, False, True, True])
Out[134]: array([-1.1935, -1.1935,  0.6775,  0.6775])

In [135]: arr[[0, 1]]
Out[135]: array([-1.1935,  0.6775])

In [136]: ser = pd.Series(np.random.randn(10))

In [137]: ser.take([False, False, True, True])
Out[137]: 
0    0.233141
0    0.233141
1   -0.223540
1   -0.223540
dtype: float64

In [138]: ser.iloc[[0, 1]]
Out[138]: 
0    0.233141
1   -0.223540
dtype: float64

最後に、パフォーマンスに関するちょっとした注意点として、take メソッドはより狭い範囲の入力を処理するため、ファンシーインデックス処理よりもかなり高速なパフォーマンスを提供できます。

In [139]: arr = np.random.randn(10000, 5)

In [140]: indexer = np.arange(10000)

In [141]: random.shuffle(indexer)

In [142]: %timeit arr[indexer]
   .....: %timeit arr.take(indexer, axis=0)
   .....: 
247 us +- 3.14 us per loop (mean +- std. dev. of 7 runs, 1,000 loops each)
75.4 us +- 2.12 us per loop (mean +- std. dev. of 7 runs, 10,000 loops each)

In [143]: ser = pd.Series(arr[:, 0])

In [144]: %timeit ser.iloc[indexer]
   .....: %timeit ser.take(indexer)
   .....: 
143 us +- 5.77 us per loop (mean +- std. dev. of 7 runs, 10,000 loops each)
133 us +- 8.27 us per loop (mean +- std. dev. of 7 runs, 10,000 loops each)

インデックスの種類#

前のセクションでは MultiIndex についてかなり詳しく説明しました。DatetimeIndex と PeriodIndex のドキュメントはこちら、TimedeltaIndex のドキュメントはこちらにあります。

以下のサブセクションでは、他のインデックスの種類についていくつか説明します。

CategoricalIndex#

CategoricalIndex は、重複する要素を持つインデックス処理をサポートするのに便利なインデックスタイプです。Categorical をラップするコンテナであり、多数の重複する要素を持つインデックスの効率的なインデックス処理と格納を可能にします。

In [145]: from pandas.api.types import CategoricalDtype

In [146]: df = pd.DataFrame({"A": np.arange(6), "B": list("aabbca")})

In [147]: df["B"] = df["B"].astype(CategoricalDtype(list("cab")))

In [148]: df
Out[148]: 
   A  B
0  0  a
1  1  a
2  2  b
3  3  b
4  4  c
5  5  a

In [149]: df.dtypes
Out[149]: 
A       int64
B    category
dtype: object

In [150]: df["B"].cat.categories
Out[150]: Index(['c', 'a', 'b'], dtype='object')

インデックスを設定すると、CategoricalIndex が作成されます。

In [151]: df2 = df.set_index("B")

In [152]: df2.index
Out[152]: CategoricalIndex(['a', 'a', 'b', 'b', 'c', 'a'], categories=['c', 'a', 'b'], ordered=False, dtype='category', name='B')

__getitem__/.iloc/.loc を使用したインデックス処理は、重複する要素を持つ Index と同様に機能します。インデクサーはカテゴリ内になければならず、そうでない場合は KeyError が発生します。

In [153]: df2.loc["a"]
Out[153]: 
   A
B   
a  0
a  1
a  5

インデックス処理後も CategoricalIndex は保持されます。

In [154]: df2.loc["a"].index
Out[154]: CategoricalIndex(['a', 'a', 'a'], categories=['c', 'a', 'b'], ordered=False, dtype='category', name='B')

インデックスをソートすると、カテゴリの順序でソートされます (インデックスを CategoricalDtype(list('cab')) で作成したため、ソート順は cab です)。

In [155]: df2.sort_index()
Out[155]: 
   A
B   
c  4
a  0
a  1
a  5
b  2
b  3

インデックスに対するグループ化操作も、インデックスの性質を維持します。

In [156]: df2.groupby(level=0, observed=True).sum()
Out[156]: 
   A
B   
c  4
a  6
b  5

In [157]: df2.groupby(level=0, observed=True).sum().index
Out[157]: CategoricalIndex(['c', 'a', 'b'], categories=['c', 'a', 'b'], ordered=False, dtype='category', name='B')

再インデックス処理操作は、渡されたインデクサーのタイプに基づいて結果のインデックスを返します。リストを渡すと通常の Index が返されます。Categorical でインデックス処理すると、渡された Categorical dtype のカテゴリに従ってインデックス処理された CategoricalIndex が返されます。これにより、カテゴリにない値であっても、これらの要素を任意にインデックス処理できます。これは、任意の pandas インデックスを再インデックス処理できるのと同様です。

In [158]: df3 = pd.DataFrame(
   .....:     {"A": np.arange(3), "B": pd.Series(list("abc")).astype("category")}
   .....: )
   .....: 

In [159]: df3 = df3.set_index("B")

In [160]: df3
Out[160]: 
   A
B   
a  0
b  1
c  2

In [161]: df3.reindex(["a", "e"])
Out[161]: 
     A
B     
a  0.0
e  NaN

In [162]: df3.reindex(["a", "e"]).index
Out[162]: Index(['a', 'e'], dtype='object', name='B')

In [163]: df3.reindex(pd.Categorical(["a", "e"], categories=list("abe")))
Out[163]: 
     A
B     
a  0.0
e  NaN

In [164]: df3.reindex(pd.Categorical(["a", "e"], categories=list("abe"))).index
Out[164]: CategoricalIndex(['a', 'e'], categories=['a', 'b', 'e'], ordered=False, dtype='category', name='B')

警告

CategoricalIndex に対する再整形操作および比較操作では、同じカテゴリを持っていなければならず、そうでない場合は TypeError が発生します。

In [165]: df4 = pd.DataFrame({"A": np.arange(2), "B": list("ba")})

In [166]: df4["B"] = df4["B"].astype(CategoricalDtype(list("ab")))

In [167]: df4 = df4.set_index("B")

In [168]: df4.index
Out[168]: CategoricalIndex(['b', 'a'], categories=['a', 'b'], ordered=False, dtype='category', name='B')

In [169]: df5 = pd.DataFrame({"A": np.arange(2), "B": list("bc")})

In [170]: df5["B"] = df5["B"].astype(CategoricalDtype(list("bc")))

In [171]: df5 = df5.set_index("B")

In [172]: df5.index
Out[172]: CategoricalIndex(['b', 'c'], categories=['b', 'c'], ordered=False, dtype='category', name='B')

In [173]: pd.concat([df4, df5])
Out[173]: 
   A
B   
b  0
a  1
b  0
c  1

RangeIndex#

RangeIndex は Index のサブクラスであり、すべての DataFrame および Series オブジェクトのデフォルトインデックスを提供します。RangeIndex は、単調順序付き集合を表すことができる Index の最適化されたバージョンです。これらは Python の range タイプに似ています。RangeIndex は常に int64 dtype を持ちます。

In [174]: idx = pd.RangeIndex(5)

In [175]: idx
Out[175]: RangeIndex(start=0, stop=5, step=1)

RangeIndex は、すべての DataFrame および Series オブジェクトのデフォルトインデックスです。

In [176]: ser = pd.Series([1, 2, 3])

In [177]: ser.index
Out[177]: RangeIndex(start=0, stop=3, step=1)

In [178]: df = pd.DataFrame([[1, 2], [3, 4]])

In [179]: df.index
Out[179]: RangeIndex(start=0, stop=2, step=1)

In [180]: df.columns
Out[180]: RangeIndex(start=0, stop=2, step=1)

RangeIndex は int64 dtype を持つ Index と同様に動作し、結果が RangeIndex で表現できないが整数 dtype を持つべき RangeIndex 上の操作は int64 を持つ Index に変換されます。例えば、

In [181]: idx[[0, 2]]
Out[181]: Index([0, 2], dtype='int64')

IntervalIndex#

IntervalIndex は、独自の dtype である IntervalDtype およびスカラー型 Interval と共に、pandas における区間表記の第一級サポートを可能にします。

IntervalIndex はいくつかのユニークなインデックス処理を可能にし、また cut() および qcut() のカテゴリの戻り値の型としても使用されます。

`IntervalIndex` を使用したインデックス処理#

IntervalIndex は Series および DataFrame のインデックスとして使用できます。

In [182]: df = pd.DataFrame(
   .....:     {"A": [1, 2, 3, 4]}, index=pd.IntervalIndex.from_breaks([0, 1, 2, 3, 4])
   .....: )
   .....: 

In [183]: df
Out[183]: 
        A
(0, 1]  1
(1, 2]  2
(2, 3]  3
(3, 4]  4

区間の端に沿って .loc を介したラベルベースのインデックス処理は期待どおりに機能し、特定の区間を選択します。

In [184]: df.loc[2]
Out[184]: 
A    2
Name: (1, 2], dtype: int64

In [185]: df.loc[[2, 3]]
Out[185]: 
        A
(1, 2]  2
(2, 3]  3

区間内に含まれるラベルを選択すると、その区間も選択されます。

In [186]: df.loc[2.5]
Out[186]: 
A    3
Name: (2, 3], dtype: int64

In [187]: df.loc[[2.5, 3.5]]
Out[187]: 
        A
(2, 3]  3
(3, 4]  4

Interval を使用して選択すると、完全に一致するもののみが返されます。

In [188]: df.loc[pd.Interval(1, 2)]
Out[188]: 
A    2
Name: (1, 2], dtype: int64

IntervalIndex に厳密に含まれていない Interval を選択しようとすると、KeyError が発生します。

In [189]: df.loc[pd.Interval(0.5, 2.5)]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[189], line 1
----> 1 df.loc[pd.Interval(0.5, 2.5)]

File ~/work/pandas/pandas/pandas/core/indexing.py:1191, in _LocationIndexer.__getitem__(self, key)
   1189 maybe_callable = com.apply_if_callable(key, self.obj)
   1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable)
-> 1191 return self._getitem_axis(maybe_callable, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexing.py:1431, in _LocIndexer._getitem_axis(self, key, axis)
   1429 # fall thru to straight lookup
   1430 self._validate_key(key, axis)
-> 1431 return self._get_label(key, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexing.py:1381, in _LocIndexer._get_label(self, label, axis)
   1379 def _get_label(self, label, axis: AxisInt):
   1380     # GH#5567 this will fail if the label is not present in the axis.
-> 1381     return self.obj.xs(label, axis=axis)

File ~/work/pandas/pandas/pandas/core/generic.py:4320, in NDFrame.xs(self, key, axis, level, drop_level)
   4318             new_index = index[loc]
   4319 else:
-> 4320     loc = index.get_loc(key)
   4322     if isinstance(loc, np.ndarray):
   4323         if loc.dtype == np.bool_:

File ~/work/pandas/pandas/pandas/core/indexes/interval.py:679, in IntervalIndex.get_loc(self, key)
    677 matches = mask.sum()
    678 if matches == 0:
--> 679     raise KeyError(key)
    680 if matches == 1:
    681     return mask.argmax()

KeyError: Interval(0.5, 2.5, closed='right')

指定された Interval と重なるすべての Intervals を選択するには、overlaps() メソッドを使用してブールインデクサーを作成できます。

In [190]: idxr = df.index.overlaps(pd.Interval(0.5, 2.5))

In [191]: idxr
Out[191]: array([ True,  True,  True, False])

In [192]: df[idxr]
Out[192]: 
        A
(0, 1]  1
(1, 2]  2
(2, 3]  3

`cut` と `qcut` を使用したデータビニング#

cut() と qcut() は両方とも Categorical オブジェクトを返し、それらが作成するビンは、.categories 属性の IntervalIndex として格納されます。

In [193]: c = pd.cut(range(4), bins=2)

In [194]: c
Out[194]: 
[(-0.003, 1.5], (-0.003, 1.5], (1.5, 3.0], (1.5, 3.0]]
Categories (2, interval[float64, right]): [(-0.003, 1.5] < (1.5, 3.0]]

In [195]: c.categories
Out[195]: IntervalIndex([(-0.003, 1.5], (1.5, 3.0]], dtype='interval[float64, right]')

cut() は bins 引数として IntervalIndex も受け入れます。これにより、便利な pandas イディオムが可能になります。まず、いくつかのデータと固定数に設定された bins を指定して cut() を呼び出してビンを生成します。次に、.categories の値を後続の cut() 呼び出しの bins 引数として渡し、同じビンにビン詰めされる新しいデータを供給します。

In [196]: pd.cut([0, 3, 5, 1], bins=c.categories)
Out[196]: 
[(-0.003, 1.5], (1.5, 3.0], NaN, (-0.003, 1.5]]
Categories (2, interval[float64, right]): [(-0.003, 1.5] < (1.5, 3.0]]

すべてのビンから外れる値には NaN 値が割り当てられます。

区間範囲の生成#

定期的な頻度で区間が必要な場合は、interval_range() 関数を使用して、start、end、periods のさまざまな組み合わせを使用して IntervalIndex を作成できます。interval_range のデフォルトの頻度は、数値区間の場合は1、datetime-like区間の場合は暦日です。

In [197]: pd.interval_range(start=0, end=5)
Out[197]: IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]], dtype='interval[int64, right]')

In [198]: pd.interval_range(start=pd.Timestamp("2017-01-01"), periods=4)
Out[198]: 
IntervalIndex([(2017-01-01 00:00:00, 2017-01-02 00:00:00],
               (2017-01-02 00:00:00, 2017-01-03 00:00:00],
               (2017-01-03 00:00:00, 2017-01-04 00:00:00],
               (2017-01-04 00:00:00, 2017-01-05 00:00:00]],
              dtype='interval[datetime64[ns], right]')

In [199]: pd.interval_range(end=pd.Timedelta("3 days"), periods=3)
Out[199]: 
IntervalIndex([(0 days 00:00:00, 1 days 00:00:00],
               (1 days 00:00:00, 2 days 00:00:00],
               (2 days 00:00:00, 3 days 00:00:00]],
              dtype='interval[timedelta64[ns], right]')

freq パラメータを使用してデフォルト以外の頻度を指定でき、datetime-likeな区間ではさまざまな頻度エイリアスを利用できます。

In [200]: pd.interval_range(start=0, periods=5, freq=1.5)
Out[200]: IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0], (6.0, 7.5]], dtype='interval[float64, right]')

In [201]: pd.interval_range(start=pd.Timestamp("2017-01-01"), periods=4, freq="W")
Out[201]: 
IntervalIndex([(2017-01-01 00:00:00, 2017-01-08 00:00:00],
               (2017-01-08 00:00:00, 2017-01-15 00:00:00],
               (2017-01-15 00:00:00, 2017-01-22 00:00:00],
               (2017-01-22 00:00:00, 2017-01-29 00:00:00]],
              dtype='interval[datetime64[ns], right]')

In [202]: pd.interval_range(start=pd.Timedelta("0 days"), periods=3, freq="9h")
Out[202]: 
IntervalIndex([(0 days 00:00:00, 0 days 09:00:00],
               (0 days 09:00:00, 0 days 18:00:00],
               (0 days 18:00:00, 1 days 03:00:00]],
              dtype='interval[timedelta64[ns], right]')

さらに、closed パラメータを使用して、区間のどちら側が閉じているかを指定できます。区間はデフォルトで右側が閉じられています。

In [203]: pd.interval_range(start=0, end=4, closed="both")
Out[203]: IntervalIndex([[0, 1], [1, 2], [2, 3], [3, 4]], dtype='interval[int64, both]')

In [204]: pd.interval_range(start=0, end=4, closed="neither")
Out[204]: IntervalIndex([(0, 1), (1, 2), (2, 3), (3, 4)], dtype='interval[int64, neither]')

start、end、periods を指定すると、start から end まで均等に間隔が空けられた区間の範囲が生成され、結果の IntervalIndex には periods 個の要素が含まれます。

In [205]: pd.interval_range(start=0, end=6, periods=4)
Out[205]: IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]], dtype='interval[float64, right]')

In [206]: pd.interval_range(pd.Timestamp("2018-01-01"), pd.Timestamp("2018-02-28"), periods=3)
Out[206]: 
IntervalIndex([(2018-01-01 00:00:00, 2018-01-20 08:00:00],
               (2018-01-20 08:00:00, 2018-02-08 16:00:00],
               (2018-02-08 16:00:00, 2018-02-28 00:00:00]],
              dtype='interval[datetime64[ns], right]')

インデックス処理に関するその他のFAQ#

整数インデックス処理#

整数軸ラベルを用いたラベルベースのインデックス処理は厄介な問題です。これはメーリングリストや科学Pythonコミュニティの様々なメンバー間で活発に議論されてきました。pandasでは、一般的に整数位置よりもラベルの方が重要であるという見解です。したがって、整数軸インデックスを持つ場合、.loc のような標準ツールではラベルベースのインデックス処理のみが可能です。以下のコードは例外を生成します。

In [207]: s = pd.Series(range(5))

In [208]: s[-1]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/work/pandas/pandas/pandas/core/indexes/range.py:413, in RangeIndex.get_loc(self, key)
    412 try:
--> 413     return self._range.index(new_key)
    414 except ValueError as err:

ValueError: -1 is not in range

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[208], line 1
----> 1 s[-1]

File ~/work/pandas/pandas/pandas/core/series.py:1130, in Series.__getitem__(self, key)
   1127     return self._values[key]
   1129 elif key_is_scalar:
-> 1130     return self._get_value(key)
   1132 # Convert generator to list before going through hashable part
   1133 # (We will iterate through the generator there to check for slices)
   1134 if is_iterator(key):

File ~/work/pandas/pandas/pandas/core/series.py:1246, in Series._get_value(self, label, takeable)
   1243     return self._values[label]
   1245 # Similar to Index.get_value, but we do not fall back to positional
-> 1246 loc = self.index.get_loc(label)
   1248 if is_integer(loc):
   1249     return self._values[loc]

File ~/work/pandas/pandas/pandas/core/indexes/range.py:415, in RangeIndex.get_loc(self, key)
    413         return self._range.index(new_key)
    414     except ValueError as err:
--> 415         raise KeyError(key) from err
    416 if isinstance(key, Hashable):
    417     raise KeyError(key)

KeyError: -1

In [209]: df = pd.DataFrame(np.random.randn(5, 4))

In [210]: df
Out[210]: 
          0         1         2         3
0 -0.435772 -1.188928 -0.808286 -0.284634
1 -1.815703  1.347213 -0.243487  0.514704
2  1.162969 -0.287725 -0.179734  0.993962
3 -0.212673  0.909872 -0.733333 -0.349893
4  0.456434 -0.306735  0.553396  0.166221

In [211]: df.loc[-2:]
Out[211]: 
          0         1         2         3
0 -0.435772 -1.188928 -0.808286 -0.284634
1 -1.815703  1.347213 -0.243487  0.514704
2  1.162969 -0.287725 -0.179734  0.993962
3 -0.212673  0.909872 -0.733333 -0.349893
4  0.456434 -0.306735  0.553396  0.166221

この意図的な決定は、あいまいさや微妙なバグを防ぐために行われました (多くのユーザーが、API変更で位置ベースのインデックス処理への「フォールバック」が停止されたときにバグを発見したと報告しています)。

非単調インデックスは厳密な一致を要求する#

Series または DataFrame のインデックスが単調増加または単調減少の場合、通常の Python list のスライスインデックス処理と同様に、ラベルベースのスライスの境界はインデックスの範囲外になることがあります。インデックスの単調性は、is_monotonic_increasing() および is_monotonic_decreasing() 属性でテストできます。

In [212]: df = pd.DataFrame(index=[2, 3, 3, 4, 5], columns=["data"], data=list(range(5)))

In [213]: df.index.is_monotonic_increasing
Out[213]: True

# no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
In [214]: df.loc[0:4, :]
Out[214]: 
   data
2     0
3     1
3     2
4     3

# slice is are outside the index, so empty DataFrame is returned
In [215]: df.loc[13:15, :]
Out[215]: 
Empty DataFrame
Columns: [data]
Index: []

一方、インデックスが単調でない場合、スライスの両方の境界はインデックスの一意なメンバーである必要があります。

In [216]: df = pd.DataFrame(index=[2, 3, 1, 4, 3, 5], columns=["data"], data=list(range(6)))

In [217]: df.index.is_monotonic_increasing
Out[217]: False

# OK because 2 and 4 are in the index
In [218]: df.loc[2:4, :]
Out[218]: 
   data
2     0
3     1
1     2
4     3

 # 0 is not in the index
In [219]: df.loc[0:4, :]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/work/pandas/pandas/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811 try:
-> 3812     return self._engine.get_loc(casted_key)
   3813 except KeyError as err:

File ~/work/pandas/pandas/pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File ~/work/pandas/pandas/pandas/_libs/index.pyx:191, in pandas._libs.index.IndexEngine.get_loc()

File ~/work/pandas/pandas/pandas/_libs/index.pyx:234, in pandas._libs.index.IndexEngine._get_loc_duplicates()

File ~/work/pandas/pandas/pandas/_libs/index.pyx:242, in pandas._libs.index.IndexEngine._maybe_get_bool_indexer()

File ~/work/pandas/pandas/pandas/_libs/index.pyx:134, in pandas._libs.index._unpack_bool_indexer()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[219], line 1
----> 1 df.loc[0:4, :]

File ~/work/pandas/pandas/pandas/core/indexing.py:1184, in _LocationIndexer.__getitem__(self, key)
   1182     if self._is_scalar_access(key):
   1183         return self.obj._get_value(*key, takeable=self._takeable)
-> 1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis
   1187     axis = self.axis or 0

File ~/work/pandas/pandas/pandas/core/indexing.py:1377, in _LocIndexer._getitem_tuple(self, tup)
   1374 if self._multi_take_opportunity(tup):
   1375     return self._multi_take(tup)
-> 1377 return self._getitem_tuple_same_dim(tup)

File ~/work/pandas/pandas/pandas/core/indexing.py:1020, in _LocationIndexer._getitem_tuple_same_dim(self, tup)
   1017 if com.is_null_slice(key):
   1018     continue
-> 1020 retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
   1021 # We should never have retval.ndim < self.ndim, as that should
   1022 #  be handled by the _getitem_lowerdim call above.
   1023 assert retval.ndim == self.ndim

File ~/work/pandas/pandas/pandas/core/indexing.py:1411, in _LocIndexer._getitem_axis(self, key, axis)
   1409 if isinstance(key, slice):
   1410     self._validate_key(key, axis)
-> 1411     return self._get_slice_axis(key, axis=axis)
   1412 elif com.is_bool_indexer(key):
   1413     return self._getbool_axis(key, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexing.py:1443, in _LocIndexer._get_slice_axis(self, slice_obj, axis)
   1440     return obj.copy(deep=False)
   1442 labels = obj._get_axis(axis)
-> 1443 indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop, slice_obj.step)
   1445 if isinstance(indexer, slice):
   1446     return self.obj._slice(indexer, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6678, in Index.slice_indexer(self, start, end, step)
   6634 def slice_indexer(
   6635     self,
   6636     start: Hashable | None = None,
   6637     end: Hashable | None = None,
   6638     step: int | None = None,
   6639 ) -> slice:
   6640     """
   6641     Compute the slice indexer for input labels and step.
   6642 
   (...)
   6676     slice(1, 3, None)
   6677     """
-> 6678     start_slice, end_slice = self.slice_locs(start, end, step=step)
   6680     # return a slice
   6681     if not is_scalar(start_slice):

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6904, in Index.slice_locs(self, start, end, step)
   6902 start_slice = None
   6903 if start is not None:
-> 6904     start_slice = self.get_slice_bound(start, "left")
   6905 if start_slice is None:
   6906     start_slice = 0

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6829, in Index.get_slice_bound(self, label, side)
   6826         return self._searchsorted_monotonic(label, side)
   6827     except ValueError:
   6828         # raise the original KeyError
-> 6829         raise err
   6831 if isinstance(slc, np.ndarray):
   6832     # get_loc may return a boolean array, which
   6833     # is OK as long as they are representable by a slice.
   6834     assert is_bool_dtype(slc.dtype)

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6823, in Index.get_slice_bound(self, label, side)
   6821 # we need to look up the label
   6822 try:
-> 6823     slc = self.get_loc(label)
   6824 except KeyError as err:
   6825     try:

File ~/work/pandas/pandas/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
   3814     if isinstance(casted_key, slice) or (
   3815         isinstance(casted_key, abc.Iterable)
   3816         and any(isinstance(x, slice) for x in casted_key)
   3817     ):
   3818         raise InvalidIndexError(key)
-> 3819     raise KeyError(key) from err
   3820 except TypeError:
   3821     # If we have a listlike key, _check_indexing_error will raise
   3822     #  InvalidIndexError. Otherwise we fall through and re-raise
   3823     #  the TypeError.
   3824     self._check_indexing_error(key)

KeyError: 0

 # 3 is not a unique label
In [220]: df.loc[2:3, :]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[220], line 1
----> 1 df.loc[2:3, :]

File ~/work/pandas/pandas/pandas/core/indexing.py:1184, in _LocationIndexer.__getitem__(self, key)
   1182     if self._is_scalar_access(key):
   1183         return self.obj._get_value(*key, takeable=self._takeable)
-> 1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis
   1187     axis = self.axis or 0

File ~/work/pandas/pandas/pandas/core/indexing.py:1377, in _LocIndexer._getitem_tuple(self, tup)
   1374 if self._multi_take_opportunity(tup):
   1375     return self._multi_take(tup)
-> 1377 return self._getitem_tuple_same_dim(tup)

File ~/work/pandas/pandas/pandas/core/indexing.py:1020, in _LocationIndexer._getitem_tuple_same_dim(self, tup)
   1017 if com.is_null_slice(key):
   1018     continue
-> 1020 retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
   1021 # We should never have retval.ndim < self.ndim, as that should
   1022 #  be handled by the _getitem_lowerdim call above.
   1023 assert retval.ndim == self.ndim

File ~/work/pandas/pandas/pandas/core/indexing.py:1411, in _LocIndexer._getitem_axis(self, key, axis)
   1409 if isinstance(key, slice):
   1410     self._validate_key(key, axis)
-> 1411     return self._get_slice_axis(key, axis=axis)
   1412 elif com.is_bool_indexer(key):
   1413     return self._getbool_axis(key, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexing.py:1443, in _LocIndexer._get_slice_axis(self, slice_obj, axis)
   1440     return obj.copy(deep=False)
   1442 labels = obj._get_axis(axis)
-> 1443 indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop, slice_obj.step)
   1445 if isinstance(indexer, slice):
   1446     return self.obj._slice(indexer, axis=axis)

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6678, in Index.slice_indexer(self, start, end, step)
   6634 def slice_indexer(
   6635     self,
   6636     start: Hashable | None = None,
   6637     end: Hashable | None = None,
   6638     step: int | None = None,
   6639 ) -> slice:
   6640     """
   6641     Compute the slice indexer for input labels and step.
   6642 
   (...)
   6676     slice(1, 3, None)
   6677     """
-> 6678     start_slice, end_slice = self.slice_locs(start, end, step=step)
   6680     # return a slice
   6681     if not is_scalar(start_slice):

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6910, in Index.slice_locs(self, start, end, step)
   6908 end_slice = None
   6909 if end is not None:
-> 6910     end_slice = self.get_slice_bound(end, "right")
   6911 if end_slice is None:
   6912     end_slice = len(self)

File ~/work/pandas/pandas/pandas/core/indexes/base.py:6837, in Index.get_slice_bound(self, label, side)
   6835     slc = lib.maybe_booleans_to_slice(slc.view("u1"))
   6836     if isinstance(slc, np.ndarray):
-> 6837         raise KeyError(
   6838             f"Cannot get {side} slice bound for non-unique "
   6839             f"label: {repr(original_label)}"
   6840         )
   6842 if isinstance(slc, slice):
   6843     if side == "left":

KeyError: 'Cannot get right slice bound for non-unique label: 3'

Index.is_monotonic_increasing と Index.is_monotonic_decreasing は、インデックスが弱単調であることのみをチェックします。厳密な単調性をチェックするには、これらいずれかと is_unique() 属性を組み合わせることができます。

In [221]: weakly_monotonic = pd.Index(["a", "b", "c", "c"])

In [222]: weakly_monotonic
Out[222]: Index(['a', 'b', 'c', 'c'], dtype='object')

In [223]: weakly_monotonic.is_monotonic_increasing
Out[223]: True

In [224]: weakly_monotonic.is_monotonic_increasing & weakly_monotonic.is_unique
Out[224]: False

終点は含まれる#

スライスの終点が含まれない標準的なPythonシーケンススライスと比較して、pandasのラベルベースのスライスは終点が含まれます。この主な理由は、インデックス内の特定のラベルの「後続」または次の要素を簡単に決定できないことが多いからです。例えば、以下の Series を考えてみましょう。

In [225]: s = pd.Series(np.random.randn(6), index=list("abcdef"))

In [226]: s
Out[226]: 
a   -0.101684
b   -0.734907
c   -0.130121
d   -0.476046
e    0.759104
f    0.213379
dtype: float64

c から e までスライスしたいと仮定すると、整数を使用すると次のように実現できます。

In [227]: s[2:5]
Out[227]: 
c   -0.130121
d   -0.476046
e    0.759104
dtype: float64

ただし、c と e しかなかった場合、インデックス内の次の要素を決定することは多少複雑になる可能性があります。たとえば、次のコードは機能しません。

In [228]: s.loc['c':'e' + 1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[228], line 1
----> 1 s.loc['c':'e' + 1]

TypeError: can only concatenate str (not "int") to str

非常に一般的なユースケースは、時系列を2つの特定の日付で開始および終了するように制限することです。これを可能にするため、ラベルベースのスライスに両方のエンドポイントを含めるという設計上の選択を行いました。

In [229]: s.loc["c":"e"]
Out[229]: 
c   -0.130121
d   -0.476046
e    0.759104
dtype: float64

これは間違いなく「実用性が純粋さに勝る」という類のものであり、ラベルベースのスライスが標準的なPythonの整数スライスと同じように動作することを期待する場合は注意すべき点です。

インデックス処理が基盤となる Series の dtype を変更する可能性#

異なるインデックス処理操作は、Series の dtype を変更する可能性があります。

In [230]: series1 = pd.Series([1, 2, 3])

In [231]: series1.dtype
Out[231]: dtype('int64')

In [232]: res = series1.reindex([0, 4])

In [233]: res.dtype
Out[233]: dtype('float64')

In [234]: res
Out[234]: 
0    1.0
4    NaN
dtype: float64

In [235]: series2 = pd.Series([True])

In [236]: series2.dtype
Out[236]: dtype('bool')

In [237]: res = series2.reindex_like(series1)

In [238]: res.dtype
Out[238]: dtype('O')

In [239]: res
Out[239]: 
0    True
1     NaN
2     NaN
dtype: object

これは、上記の (再) インデックス処理操作がサイレントに NaN を挿入し、それに応じて dtype が変更されるためです。これは numpy.logical_and のような numpy の ufunc を使用する際に問題を引き起こす可能性があります。

詳細な議論については、GH 2388 を参照してください。