Unit 2 - Operations on a Series
CBSE Revision Notes
Class-11 Informatics Practices (New Syllabus)
Unit 2: Data Handling (DH-1)
Operations on a Series
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.
pandas.Series
A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
S.No | Parameter & Description |
---|---|
1 | data - data takes various forms like ndarray, list, constants |
2 | index - Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed. |
3 | dtype - dtype is for data type. If None, data type will be inferred |
4 | copy - Copy data. Default False |
A series can be created using various inputs like −
- Array
- Dict
- Scalar value or constant
Create an Empty Series
A basic series, which can be created is an Empty Series.
Example
#import the pandas library and aliasing as pd import pandas as pd s = pd.Series() print s
Its output is as follows −
Series([], dtype: float64)
Create a Series from ndarray
If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1].
Example 1
#import the pandas library and aliasing as pd import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data) print s
Its output is as follows −
0 a 1 b 2 c 3 d dtype: object
We did not pass any index, so by default, it assigned the indexes ranging from 0 to len(data)-1, i.e., 0 to 3.
Example 2
#import the pandas library and aliasing as pd import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print s
Its output is as follows −
100 a 101 b 102 c 103 d dtype: object
We passed the index values here. Now we can see the customized indexed values in the output.
Create a Series from dict
A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.
Example 1
#import the pandas library and aliasing as pd import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data) print s
Its output is as follows −
a 0.0 b 1.0 c 2.0 dtype: float64
Observe − Dictionary keys are used to construct index.
Example 2
#import the pandas library and aliasing as pd import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data,index=['b','c','d','a']) print s
Its output is as follows −
b 1.0 c 2.0 d NaN a 0.0 dtype: float64
Observe − Index order is persisted and the missing element is filled with NaN (Not a Number).
Create a Series from Scalar
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index
#import the pandas library and aliasing as pd import pandas as pd import numpy as np s = pd.Series(5, index=[0, 1, 2, 3]) print s
Its output is as follows −
0 5 1 5 2 5 3 5 dtype: int64
pandas.Series.head
Series.
head
(n=5)
Return the first n rows.
This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
Parameters: | n : int, default 5 Number of rows to select. |
Returns: | obj_head : type of caller The first n rows of the caller object. |
Returns the last n rows.
Examples
>>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra']}) >>> df animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot 6 shark 7 whale 8 zebra
Viewing the first 5 lines
>>> df.head() animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey
Viewing the first n lines (three in this case)
>>> df.head(3) animal 0 alligator 1 bee 2 falcon
pandas.Series.tail
Series.
tail
(n=5)
Return the last n rows.
This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.
Parameters: | n : int, default 5 Number of rows to select. |
Returns: | type of caller The last n rows of the caller object. |
The first n rows of the caller object.
Examples
>>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra']}) >>> df animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot 6 shark 7 whale 8 zebra
Viewing the last 5 lines
>>> df.tail() animal 4 monkey 5 parrot 6 shark 7 whale 8 zebra
Viewing the last n lines (three in this case)
>>> df.tail(3) animal 6 shark 7 whale 8 zebra
Here we discuss a lot of the essential functionality common to the pandas data structures. Here’s how to create some of the objects used in the examples from the previous section:
In [1]: index = pd.date_range('1/1/2000', periods=8) In [2]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) In [3]: df = pd.DataFrame(np.random.randn(8, 3), index=index, ...: columns=['A', 'B', 'C']) ...: In [4]: wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'], ...: major_axis=pd.date_range('1/1/2000', periods=5), ...: minor_axis=['A', 'B', 'C', 'D']) ...:
Head and Tail
To view a small sample of a Series or DataFrame object, use the head()
and tail()
methods. The default number of elements to display is five, but you may pass a custom number.
In [5]: long_series = pd.Series(np.random.randn(1000)) In [6]: long_series.head() Out[6]: 0 0.229453 1 0.304418 2 0.736135 3 -0.859631 4 -0.424100 dtype: float64 In [7]: long_series.tail(3) Out[7]: 997 -0.351587 998 1.136249 999 -0.448789 dtype: float64
Attributes and the raw ndarray(s)
pandas objects have a number of attributes enabling you to access the metadata
- shape: gives the axis dimensions of the object, consistent with ndarray
- Axis labels
- Series: index (only axis)
- DataFrame: index (rows) and columns
- Panel: items, major_axis, and minor_axis
Note, these attributes can be safely assigned to!
In [8]: df[:2] Out[8]: A B C 2000-01-01 0.048869 -1.360687 -0.47901 2000-01-02 -0.859661 -0.231595 -0.52775 In [9]: df.columns = [x.lower() for x in df.columns] In [10]: df Out[10]: a b c 2000-01-01 0.048869 -1.360687 -0.479010 2000-01-02 -0.859661 -0.231595 -0.527750 2000-01-03 -1.296337 0.150680 0.123836 2000-01-04 0.571764 1.555563 -0.823761 2000-01-05 0.535420 -1.032853 1.469725 2000-01-06 1.304124 1.449735 0.203109 2000-01-07 -1.032011 0.969818 -0.962723 2000-01-08 1.382083 -0.938794 0.669142
Comments
Post a Comment