Table of Contents

**Python Pandas:**

pandas is a Python library that serves fast, flexible, and eloquent data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the basic high-level building block for doing practical, real world data analysis in Python. The two constitutional data structures of Python Pandas, Series (one-dimensional) and DataFrame (two-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. Pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other third-party libraries. Pandas is well suited for inserting and deleting columns from DataFrame, for easy handling of missing data (represented as NaN), explicitly aligning data to a set of labels, converting data in other Python and NumPy data structures into DataFrame objects, intelligent label-based slicing, indexing, and subsetting of large data sets, merging and joining of data sets, and flexible reshaping. Additionally, it has robust input/output tools for loading data from CSV files, Excel files, databases, and other formats. You have to import a Pandas library to make use of various functions and data structures defined in Python Pandas.

1 |
import pandas as pd |

*Python Pandas *is usually renamed as *pd*.

**Python Pandas Series:**

Series is a 1-dimensional labeled array adept of holding any data type (integers, strings, floating-point numbers, Python objects, etc.). The axis labels are accordingly referred to as the index. Python Pandas Series is created using series() method and its syntax is,

s = pd.Series(data, index=None)

Here, s is the Pandas Series, data can be a Python dict, a ndarray, or a scalar value (like 5). The passed index is a list of axis labels. Both integer and label-based indexing are supported. If the index is not arranged, then the index will default to range(n) where n is the length of data. For example, Create Series from ndarrays

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
>>> import numpy as np >>> import pandas as pd >>> s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) >>> type(s) <class 'pandas.core.series.Series'> >>> s a -0.367740 b 0.855453 c -0.518004 d -0.060861 e -0.277982 index dtype: float64 >>> s.index Index(['a', 'b', 'c', 'd', 'e'], dtype='object') >>> s.values array([-0.367740, 0.855453, -0.518004, -0.060861, -0.277982]) >>> pd.Series(np.random.randn(5)) 0 0.334947 1 -2.184006 2 -0.209440 3 -0.492398 4 -1.507088 dtype: float64 |

Import NumPy and Pandas libraries. Create a series using *ndarray *which is NumPy’s array class using *Series() *method which returns a Pandas Series type *s*. You can also specify axis labels for *index*, i.e., index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’]. When data is a *ndarray*, the *index *must be the same length as data. In series *s*, by default the type of values of all the elements is *dtype: float64*. You can find out the index for a series using *index *attribute. The *values *attribute returns a *ndarray * containing only values, while the axis labels are removed. If no labels for the index is passed, one will be created having a range of index values [0,…, len(data) – 1].

**Python Pandas ****Create Series from Dictionaries**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> import numpy as np >>> import pandas as pd >>> d = {'a' : 0., 'b' : 1., 'c' : 2.} >>> pd.Series(d) a 0.0 b 1.0 c 2.0 dtype: float64 >>> pd.Series(d, index=['b', 'c', 'd', 'a']) b 1.0 c 2.0 d NaN a 0.0 dtype: float64 |

Series can be created from the dictionary. Create a dictionary and pass it to Series() method. When a series is created using dictionaries, by default the keys will be index labels. While creating series using a dictionary, if labels are passed for the index, the values corresponding to the labels in the index will be pulled out. The order of index labels will be preserved. If a value is not associated for a label, then NaN is printed. NaN (not a number) is the standard missing data marker used in pandas.

**Create Series from Scalar data**

1 2 3 4 5 6 7 8 9 |
>>> import numpy as np >>> import pandas as pd >>> pd.Series(5., index=['a', 'b', 'c', 'd', 'e']) a 5.0 b 5.0 c 5.0 d 5.0 e 5.0 dtype: float64 |

You can create a Python Pandas Series from scalar value. Here scalar value is five. If data is a scalar value, an index must be arranged. The value will be repeated to match the length of the index.

**Python Pandas ****Series Indexing and Slicing**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
>>> import numpy as np >>> import pandas as pd >>> s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) >>> s a 0.481557 b 2.053330 c -1.799993 d -0.396880 e -1.270751 dtype: float64 >>> s[0] 0.48155677569897515 >>> s[1:3] b 2.053330 c -1.799993 dtype: float64 >>> s[:3] a 0.481557 b 2.053330 c -1.799993 dtype: float64 >>> s[s > .5] b 2.05333 dtype: float64 >>> s[[4, 3, 1]] e -1.270751 d -0.396880 b 2.053330 dtype: float64 >>> s['a'] 0.48155677569897515 >>> s['e'] -1.270750548062543 >>> 'e' in s True >>> 'f' in s False |

You can provide index or slice data by index numbers in a Python Pandas Series. You can also specify a Boolean array indexing for Pandas Series. Multiple indices are specified as a list in. The index can be an integer value or a label _. Values associated with labeled index are extracted and displayed _– . Check for the presence of a label in Series using *in *operator .

**Python Pandas Working with Text Data**

The Pandas Series supports a set of string processing methods that make it easy to operate on each element of the array. These methods are accessible via the *str *attribute and they generally have the same name as that of the built-in Python string methods.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
>>> import numpy as np >>> import pandas as pd >>> empires_ds = pd.Series(["Vijayanagara", "Roman", "Chola", "Mongol", "Akkadian"]) >>> empires_ds.str.lower() 0 vijayanagara 1 roman 2 chola 3 mongol 4 akkadian dtype: object >>> empires_ds.str.upper() 0 VIJAYANAGARA 1 ROMAN 2 CHOLA 3 MONGOL 4 AKKADIAN dtype: object >>> empires_ds.str.len() 0 11 1 5 2 5 3 6 4 8 dtype: int64 >>> tennis_ds = pd.Series([' Seles ', ' Graph ', ' Williams ']) >>> tennis_ds.str.strip() 0 Seles 1 Graph 2 Williams dtype: object >>> tennis_ds.str.contains(' ') 0 True 1 True 2 True dtype: bool >>> marvel_ds = pd.Series(['Thor_loki', 'Thor_Hulk', 'Gamora_Storm']) >>> marvel_ds.str.split('_') 0 [Thor, loki] 1 [Thor, Hulk] 2 [Gamora, Storm] dtype: object >>> planets = pd.Series(["Venus", "Earth", "Saturn"]) >>> planets.str.replace("Earth", "Mars") 0 Venus 1 Mars 2 Saturn dtype: object >>> letters_ds = pd.Series(['a', 'b', 'c', 'd']) >> letters_ds.str.cat(sep=',') 'a,b,c,d' >>> names_ds = pd.Series(['Jahnavi', 'Adelmo', 'Pietro', 'Alejandro']) >>> names_ds.str.count('e') 0 0 1 1 2 1 3 1 dtype: int64 >>> names_ds.str.startswith('A') 0 False 1 True 2 False 3 True dtype: bool >>> names_ds.str.endswith('O') 0 False 1 False 2 False 3 False dtype: bool >>> names_ds.str.find('J') 0 0 1 -1 2 -1 3 -1 dtype: int64 |

Various string methods to operate with Python Pandas Series is discussed.