Pandas Series in Python

Pandas is an open source python library widely used for data science/ data analysis and machine learning tasks. Pandas stands for panel data, referring to tabular format.

Installing pandas:

pip install pandas
pip install numpy

This is because pandas is built on top of numpy which provides support for multi-dimensional arrays so it is always good to install with numpy. Pandas adopts many styles from numpy. The biggest difference is that pandas is for working with tabular, heterogeneous data whereas numpy is for working with homogeneous numerical data.

Pandas data structures:

               Pandas have 3 data structures based on dimensionality.

  • Series – one dimension
  • DataFrame – two dimensions
  • Panel – three dimensions
Data StructureDimensionsDescription
Series11D labeled homogeneous array, sizeimmutable.
Data Frames2General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.
Panel3General 3D labeled, size-mutable array.

These data structures are built on top of numpy array. They are fast. Among these three, series and DataFrame are most widely used in data science and analysis tasks. In terms of Spreadsheet we can say series would be single column of spreadsheet, DataFrame is combination of rows and columns in a spreadsheet and panel is group of spreadsheet which have multiple DataFrames.

We can think of this as higher dimensional data structure is a container of lower dimensional data structure.

In this article we see about Pandas series datastructure.

Series Data Structure in Pandas:

One-dimensional array like data structure with some additional features. Series contains one axis, that is labelled as index of series. It is homogeneous and immutable array.

It consists of two components.

  • One-dimensional data (values)
  • Index
Syntax of series:
pandas. Series(data, index, dtype, copy)
Parameters:

Data – takes forms like  one dimensional ndarray, list, dictionary, scalar value.

Index – index values must be unique and hashable, same length as data. Default np.arange(n) if no index is passed.

Dtype– It is for data type. If none, data type will be inferred. Copy- copy data. Default false.

Example:

Creation of Series from ndarray:

import pandas as pd
import numpy as np
data = np.array(['Anna','Benny','Cathrin','Daniel'])
s = pd.Series(data,index=[11,12,13,14])
print(s)
#retrieving data with index
print(s[11])

Creation of Series from Dictionary:

import pandas as pd
import numpy as np
data = {'Anna' : 11, 'Benny' : 12, 'Cathrin' : 13, 'Daniel' : 14}
s = pd.Series(data)
print(s)
#accessing the data
print(s[0])

DataFrame:

DataFrame is a two dimensional data structure having labelled axes as rows and columns. Here we have three components

  • Data (heterogeneous data)
  • Rows (horizontal)
  • Columns (vertical)

Each column of DataFrame is a separate pandas Series. DataFrames are both value and size mutable.

Syntax:
pandas.DataFrame(data, index, columns, dtype, copy)

DataFrame accepts many different types of arguments

  • A two-dimensional ndarray
  • A dictionary of dictionaries
  • A dictionary of lists
  • A dictionary of series

DataFrame creation using ndarray:

import pandas as pd

import numpy as np

a=np.array([['Anna',24],['Kathrin',25],['Manuel',23],['Daniel',22],['Thomas',27]])

s=pd.DataFrame(a,index=[1,2,3,4,5],columns=['Name', 'Age'])

print(s)

DataFrame creation using dictionary of dictionaries:

import pandas as pd
import numpy as np
a={'name':{11:'Anna',12:'Kathrin',13:'Manuel',14:'Daniel',15:'Thomas'},
  'age': {11:24,12:25,13:23,14:22,15:27}}
df=pd.DataFrame(a)
print(df)
print(df.index)
print(df.values)
print(df.columns)

Panels:

Pandas panel data structure is used for working with three dimensional data. It can be seen as set of DataFrames. It is also heterogeneous data and value and size mutable.

Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)

Items: Each item in this axis corresponds to one dataframe and it is axis 0

Major_axis: This contains rows or indexes of the dataframe and it is axis 1

Minor_axis: This contains columns or values of the dataframe and it is axis 2

We will learn more about DataFrames and panels data structures in coming articles.

numpy.tril() in Python

numpy.tril() returns a copy of the array matrix with an element of the lower part of the triangle with respect to k. It returns a copy of array with lower part of triangle and above the kth diagonal zeroed.

Syntax:

numpy.tril(m,k=0)

Parameters:

m- number of rows in the array.

k-It is optional. Diagonal below which to zero elements. k=0 is the main diagonal and its default.

k>0 is above it and k<0 is below it.

Return value:

It returns a copy of the matrix with elements above the kth diagonal zeroed.

Example: 

import numpy as np

m = np.array([[1,2,3],[4,5,6],[7,8,9]])

print(“Sample array”)

print(m)

print(“\ntril() function without any parameter:”)

print(np.tril(m))

print(“\nAbove 1st diagonal zeroed.”)

print(np.tril(m,-1))

print(“\nAbove 2nd diagonal zeroed.”)

print(np.tril(m,-2))

numpy.triu() in Python

numpy.triu() returns a copy of the array matrix with an element of the upper part of the triangle with respect to k. It returns a copy of array with upper part of triangle and below the kth diagonal zeroed.

Syntax:

numpy.triu(m,k=0)

Parameters:

m- number of rows in the array.

k-It is optional. Diagonal above which to zero elements. k=0 is the main diagonal and its default.

k>0 is above it and k<0 is below it.

Return value:

It returns a copy of the matrix with elements below the kth diagonal zeroed.

Example:

import numpy as np

m = np.array([[1,2,3],[4,5,6],[7,8,9]])

print(“Sample array”)

print(m)

print(“\ntriu() function without any parameter:”)

print(np.triu(m))

print(“\nBelow 1st diagonal zeroed.”)

print(np.triu(m,-1))

print(“\nBelow 2nd diagonal zeroed.”)

print(np.triu(m,-2))

numpy.ndarray.flatten() in Python:

Numpy flatten() converts a multi dimensional array into ‘flattened’ one dimensional array. It returns a copy of the array in one dimension. Now let us see about numpy.ndarray.flatten().

numpy.ndarray.flatten()

Syntax:

ndarray.flatten(order = ‘C’)

Parameters:

  • order: The order in which items from numpy array wil be used.
  • ‘C’ – read items row wise i.e, using row major order
  • ‘F’ – read items column wise i.e, column major order
  • ‘K’ – read items in the order that occur in the memory
  • ‘A’ – read items column wise only when the array is Fortran contiguous in memory.
  • The default is ‘C’

Returns:

A copy of input array, flattened  to 1D array.

Example:

Flatten an array by row:
import numpy as np 

a = np.array([[1,2,4], [3,5,7],[4,6,8]]) 

b=a.flatten('C') 

print('Flattened array by row:\n', b)
Flatten an array by column:
import numpy as np 

a = np.array([[1,2,4], [3,5,7],[4,6,8]]) 

b=a.flatten('F') 

print('Flattened array by column:\n', b)

ndarray.flatten() returns the copy of the original array any changes made in flattened array will not be reflected in original array.

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

flat_array = a.flatten()

flat_array[2] = 10

print('Flattened 1D Numpy Array:')

print(flat_array)

print('Original Numpy Array')

print(a)

If you look the above code when we change the array value of index 2 as 10 i.e flat_array[2] = 10, it won’t affect the original array only the copy is changed.

To know more about numpy click here!

numpy.log() or np.log() in Python

What is numpy.log()?

numpy.log() is a mathematical function that helps user to calculate Natural logarithm of x where x belongs to imput array elements. The natural logarithm log is an inverse of exponential function log(exp(x)) = x. It is a logarithm of base ‘e’. It calculates the mathematical log of any number or array.

Syntax:

Parameters:

3 parameters of np.log are x, out and where.

out and where are rarely used.

x provide input to function. This parameter accepts array like objects as input. Like it will also accept Python list as input.

How to use np.log() in Python?

First we have to import numpy module using the command.

import numpy as np

Now the numpy module is imported.

np.log with a single number:

We will try to apply log on numbers and on mathematical constant e, Euler’s number.

import numpy as np
np.log(2)
import numpy as np
print(np.e)
np.log(np.e)

What will happen if we use log on 0?

Lets see what happens

import numpy as np
print(np.log(0))

Calculating log with base 2 and base 10:

import numpy as np
print(np.log2(8))
import numpy as np
print(np.log2(30))
import numpy as np
print(np.log10(30))
import numpy as np
print(np.log10(100))

Calculating log on 1D array:

To calculate the logarithm of a 1D array:

import numpy as np
np.log([3,4,5,7])
import numpy as np
arr1=np.array([1,3,5,5**3])
print(arr1)
arr2=np.log(arr1)
print(arr2)
arr3=np.log2(arr1)
print(arr3)
arr4=np.log10(arr1)
print(arr4)

Calculating log on 2D array:

To calculate the logarithm of a 2D array:

import numpy as np
nparray = np.arange(1,10).reshape(3,3)
print("Original np 2D Array:\n",nparray)
print("\n Logarithmic value of 2D np array elements:\n",np.log(nparray))

Plotting log using matplotlib:

Let’s try plotting a graph for the logarithmic function using matplotlib. If you are not familiar with matplotlib, we have a separate article on this check here.

import numpy as np
import matplotlib.pyplot as plt
arr1 = [3,2.1,2.2,3.2,3.3]
result1=np.log(arr1)
result2=np.log2(arr1)
result3=np.log10(arr1)
plt.plot(arr1,arr1, color='green', marker="") plt.plot(result1,arr1, color='red', marker="o") plt.plot(result2,arr1, color='blue', marker="")
plt.plot(result3,arr1, color='black', marker="o")
plt.show()