Pandas Series in Python

Pandas is an open source python library widely used for data science/ data analysis and machine learning tasks. Pandas stands for panel data, referring to tabular format.

Installing pandas:

pip install pandas
pip install numpy

This is because pandas is built on top of numpy which provides support for multi-dimensional arrays so it is always good to install with numpy. Pandas adopts many styles from numpy. The biggest difference is that pandas is for working with tabular, heterogeneous data whereas numpy is for working with homogeneous numerical data.

Pandas data structures:

               Pandas have 3 data structures based on dimensionality.

  • Series – one dimension
  • DataFrame – two dimensions
  • Panel – three dimensions
Data StructureDimensionsDescription
Series11D labeled homogeneous array, sizeimmutable.
Data Frames2General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.
Panel3General 3D labeled, size-mutable array.

These data structures are built on top of numpy array. They are fast. Among these three, series and DataFrame are most widely used in data science and analysis tasks. In terms of Spreadsheet we can say series would be single column of spreadsheet, DataFrame is combination of rows and columns in a spreadsheet and panel is group of spreadsheet which have multiple DataFrames.

We can think of this as higher dimensional data structure is a container of lower dimensional data structure.

In this article we see about Pandas series datastructure.

Series Data Structure in Pandas:

One-dimensional array like data structure with some additional features. Series contains one axis, that is labelled as index of series. It is homogeneous and immutable array.

It consists of two components.

  • One-dimensional data (values)
  • Index
Syntax of series:
pandas. Series(data, index, dtype, copy)
Parameters:

Data – takes forms like  one dimensional ndarray, list, dictionary, scalar value.

Index – index values must be unique and hashable, same length as data. Default np.arange(n) if no index is passed.

Dtype– It is for data type. If none, data type will be inferred. Copy- copy data. Default false.

Example:

Creation of Series from ndarray:

import pandas as pd
import numpy as np
data = np.array(['Anna','Benny','Cathrin','Daniel'])
s = pd.Series(data,index=[11,12,13,14])
print(s)
#retrieving data with index
print(s[11])

Creation of Series from Dictionary:

import pandas as pd
import numpy as np
data = {'Anna' : 11, 'Benny' : 12, 'Cathrin' : 13, 'Daniel' : 14}
s = pd.Series(data)
print(s)
#accessing the data
print(s[0])

DataFrame:

DataFrame is a two dimensional data structure having labelled axes as rows and columns. Here we have three components

  • Data (heterogeneous data)
  • Rows (horizontal)
  • Columns (vertical)

Each column of DataFrame is a separate pandas Series. DataFrames are both value and size mutable.

Syntax:
pandas.DataFrame(data, index, columns, dtype, copy)

DataFrame accepts many different types of arguments

  • A two-dimensional ndarray
  • A dictionary of dictionaries
  • A dictionary of lists
  • A dictionary of series

DataFrame creation using ndarray:

import pandas as pd

import numpy as np

a=np.array([['Anna',24],['Kathrin',25],['Manuel',23],['Daniel',22],['Thomas',27]])

s=pd.DataFrame(a,index=[1,2,3,4,5],columns=['Name', 'Age'])

print(s)

DataFrame creation using dictionary of dictionaries:

import pandas as pd
import numpy as np
a={'name':{11:'Anna',12:'Kathrin',13:'Manuel',14:'Daniel',15:'Thomas'},
  'age': {11:24,12:25,13:23,14:22,15:27}}
df=pd.DataFrame(a)
print(df)
print(df.index)
print(df.values)
print(df.columns)

Panels:

Pandas panel data structure is used for working with three dimensional data. It can be seen as set of DataFrames. It is also heterogeneous data and value and size mutable.

Syntax:
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)

Items: Each item in this axis corresponds to one dataframe and it is axis 0

Major_axis: This contains rows or indexes of the dataframe and it is axis 1

Minor_axis: This contains columns or values of the dataframe and it is axis 2

We will learn more about DataFrames and panels data structures in coming articles.

Magic / Dunder Methods in Python

Magic or dunder methods are special methods in Python OOPs that enrich your classes. It is also known as dunder methods. These methods are distinguished from other methods in a special way by using double underscore before and after the method name.

This method allows instances of a class to interact with the built-in functions and operators. The name “dunder” comes from “double underscore”. Simply it can be thought as a contract between your implementation and Python interpreter. Main terms of the contract is Python performing some actions behind the scenes under given circumstances. It is used in operator overloading.

Why the name magic methods?

Here you don’t need to call them directly as you do for other methods. Python calls them for you internally from the class on certain action. For example, when adding two numbers using + operator, python calls __add__() method internally.

Well known Dunder method: __init__()

If you have already used class in Python then you might have seen __init__() method. Constructors in Python are dunder methods.  Its used to pass initial arguments to Python class. It is used for object initialization.

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
p = Point(10,5)
print(p)  

Gives output as

<__main__.Point object at 0x00000205182133A0>

When __init__() method is invoked, the object here point is passed as self. Other arguments used in the method call are passed as the rest of the arguments to the function.

It just returns the memory address of the object point. For object representation we have __repr__() and __str__() methods.

__repr__():

It is the official or formal string representation of objects.

class Point:      
    def __init__(self, x, y):
        self.x = x
        self.y = y     
    def __repr__(self):
        return f"Point(x={self.x}, y={self.y})"

p = Point(10,5)
print(p)
Output:
Point(x=10, y=5)

This gives the name of class and value of properties of the class. The goal here is to be unambiguous. Very helpful in debugging.

__str__():

It is used for informal string representation of object. If not implemented, __repr__ will be used as a fallback. Here goal is to be readable.

class Point:   

    def __init__(self, x, y):

        self.x = x

        self.y = y

    def __str__(self):

        return f"Point(x={self.x}, y={self.y})"

p = Point(10,5)

print(p)
Output:
Point(x=10, y=5)

when you call print() it first looks for __str__() to see if it’s been defined otherwise it calls __repr__().

__new__():

When you create an instance of a class, Python first calls the __new__() method to create the object and then calls the __init__() method to initialize the object’s attributes. It is a static method of the object class.

class SquareNumber(int):
    def __new__(cls, value):
        return super().__new__(cls, value ** 2)
x = SquareNumber(3)
print(x)
Output:

9

__add__():

The magic method __add()__ performs the addition of the specified attributes of the objects. Internally, the addition of these two distance objects is desired to be performed using the overloading + operator.

class concat:
  
    def __init__(self, val):
        self.val = val
          
    def __add__(self, val2):
        return concat(self.val + val2.val)
  
obj1 = concat("Hello")
obj2 = concat("Abaython")
obj3 = obj1 + obj2
print(obj3.val)
Output:
HelloAbaython

Magic or dunder methods are useful to emulate behaviour of built-in types to user defined objects and is core Python feature that should be used as needed. 

Points to note:

  • The __init__() method is automatically invoked during the time of the creation of an instance of a class. It is called a magic method because it is called automatically by Python.
  • This __repr__ magic method is used to represent a class instance in a string and it returns the string representation of the value supplied.
  • The __new__() magic method is implicitly called before the __init__() method. It returns a new object, which is then initialized by __init__().
  • The built-in str() function returns a string from the object parameter and it internally calls the __str__() method defined in the int class.
  • The __add()__ method is called internally when you use the + operator to add two numbers or for concatenation of strings

f-strings in Python

Python f-strings or formatted strings are new way of formatting strings introduced in Python 3.6 under PEP-498. It’s also called literal string interpolation. They provide a better way to format strings and make debugging easier too.

What are f-strings?

Strings in Python are usually enclosed in “ ” (double quotes) or ‘ ‘ ( single quotes). To create f-Strings, you only need to add an f or an F before the opening quotes of your string. For example,

“Anna” is a string whereas f”Anna” is an f-String.

f-Strings provide a concise and convenient way to embed python expressions inside string literals for formatting.

Why do we need f-Strings?

Before Python 2.6, to format a string, one would either use % operator or string.Template module. Later str.format method came along and added to the language a more flexible and robust way of formatting a string.

%formatting: great for simple formatting but limited support for strings, ints, doubles etc., We can’t use it with objects.

msg = ‘hello world’

‘msg: %s’ % msg

Output:
'msg: hello world'
Template strings:

Template strings are useful for keyword arguments like dictionaries. We cannot call any function and arguments must be string.

msg = ‘hello world’

‘msg: {}’.format(msg)

Output:
'msg: hello world'
String format():

String format() function was introduced to overcome the limitations of %formatting and template strings. But this also has verbosity.

age = 3 * 10

‘My age is {age}.’.format(age=age)

Output:
'My age is 30.'

Introduction of f-string in Python is to have minimal syntax for string formatting.

f-String Syntax:
f"This is an f-string {var_name} and {var_name}."

Enclosing of variables are within  curly braces {}

Example:

val1 = ‘Abaython’

val2 = ‘Python’

print(f”{val1} is a portal for {val2}.”)

Output:
Abaython is a portal for Python.

How to evaluate expressions in f-String?

We can evaluate valid expressions on the fly using,

num1 = 25

num2 = 45

print(f”The product of {num1} and {num2} is {num1 * num2}.”)

Output:
The product of 25 and 45 is 1125.
How to call functions using f-String?

def mul(x, y):

    return x * y

print(f’Product(10,20) = {mul(10, 20)}’)

Output:
Product(10,20) = 200

numpy.tril() in Python

numpy.tril() returns a copy of the array matrix with an element of the lower part of the triangle with respect to k. It returns a copy of array with lower part of triangle and above the kth diagonal zeroed.

Syntax:

numpy.tril(m,k=0)

Parameters:

m- number of rows in the array.

k-It is optional. Diagonal below which to zero elements. k=0 is the main diagonal and its default.

k>0 is above it and k<0 is below it.

Return value:

It returns a copy of the matrix with elements above the kth diagonal zeroed.

Example: 

import numpy as np

m = np.array([[1,2,3],[4,5,6],[7,8,9]])

print(“Sample array”)

print(m)

print(“\ntril() function without any parameter:”)

print(np.tril(m))

print(“\nAbove 1st diagonal zeroed.”)

print(np.tril(m,-1))

print(“\nAbove 2nd diagonal zeroed.”)

print(np.tril(m,-2))

numpy.triu() in Python

numpy.triu() returns a copy of the array matrix with an element of the upper part of the triangle with respect to k. It returns a copy of array with upper part of triangle and below the kth diagonal zeroed.

Syntax:

numpy.triu(m,k=0)

Parameters:

m- number of rows in the array.

k-It is optional. Diagonal above which to zero elements. k=0 is the main diagonal and its default.

k>0 is above it and k<0 is below it.

Return value:

It returns a copy of the matrix with elements below the kth diagonal zeroed.

Example:

import numpy as np

m = np.array([[1,2,3],[4,5,6],[7,8,9]])

print(“Sample array”)

print(m)

print(“\ntriu() function without any parameter:”)

print(np.triu(m))

print(“\nBelow 1st diagonal zeroed.”)

print(np.triu(m,-1))

print(“\nBelow 2nd diagonal zeroed.”)

print(np.triu(m,-2))

numpy.ravel() in Python

numpy.ravel() is a function present in numpy module which allows us to change a 2 dimensional or multi dimensional array into a contiguous flattened array i.e 1 dimensional array with all input elements and of same type as it. A copy is made only when needed. If input array is masked then returned array is also masked.

Syntax:

numpy.ravel(x, order=’C’)

Parameters:

x : array_like

This reads the input array and all the elements are read in the order specified by the order parameter.

order:{‘C’,’F’,’A’,’K’} (optional)

This order parameter is optional.

  • Order parameter ‘C’ means array flattens in row major order. The last axis change is fastest and the first axis change is slowest.
  • Order parameter ‘F’  (Fortran contiguous memory order) means array flattens in the column major order. Here the first axis change is fastest and last axis change is slowest.
  • Order parameter ‘A’ means read / write elements in Fortran like index order only if array is fortran contiguous memory order otherwise C like order.
  • Order parameter ‘K’ means read / write elements in order present as it is in the system.

Returns:

This function returns a contiguous flattened array with same data type as input array and has equal size (x.size)

Example1:

import numpy as np 

x = np.array([[4, 7, 8, 9], [16, 24, 86,45]]) 

y=np.ravel(x) 

y

Example2:

import numpy as np

x = np.array([[23, 76, 11, 42], [74, 91, 8, 34]])

y = np.ravel(x)

print('Flattened array: \n', y)

y[1] = 121

print('Original array: \n', x)

The above example shows that any changes in flattened array will reflect in original array as you can see that value y[1] = 121 is changed in original array as well.

Example3:

import numpy as np

a = np.arange(12).reshape(3,4)

print('The original array is:\n',a)

print('\n') 

print('After applying ravel function:',a.ravel())

print('\n' )

print('Applying ravel function in F-style ordering:',a.ravel(order = 'F'))

print('Applying ravel function in C-style ordering:',a.ravel(order = 'C'))

print('Applying ravel function in K-style ordering:',a.ravel(order = 'K'))

print('Applying ravel function in A-style ordering:',a.ravel(order = 'A'))

With the above examples we have seen about the ravel function and its ordering styles in detail. Now if you want to know more about numpy click here!

numpy.ndarray.flatten() in Python:

Numpy flatten() converts a multi dimensional array into ‘flattened’ one dimensional array. It returns a copy of the array in one dimension. Now let us see about numpy.ndarray.flatten().

numpy.ndarray.flatten()

Syntax:

ndarray.flatten(order = ‘C’)

Parameters:

  • order: The order in which items from numpy array wil be used.
  • ‘C’ – read items row wise i.e, using row major order
  • ‘F’ – read items column wise i.e, column major order
  • ‘K’ – read items in the order that occur in the memory
  • ‘A’ – read items column wise only when the array is Fortran contiguous in memory.
  • The default is ‘C’

Returns:

A copy of input array, flattened  to 1D array.

Example:

Flatten an array by row:
import numpy as np 

a = np.array([[1,2,4], [3,5,7],[4,6,8]]) 

b=a.flatten('C') 

print('Flattened array by row:\n', b)
Flatten an array by column:
import numpy as np 

a = np.array([[1,2,4], [3,5,7],[4,6,8]]) 

b=a.flatten('F') 

print('Flattened array by column:\n', b)

ndarray.flatten() returns the copy of the original array any changes made in flattened array will not be reflected in original array.

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

flat_array = a.flatten()

flat_array[2] = 10

print('Flattened 1D Numpy Array:')

print(flat_array)

print('Original Numpy Array')

print(a)

If you look the above code when we change the array value of index 2 as 10 i.e flat_array[2] = 10, it won’t affect the original array only the copy is changed.

To know more about numpy click here!

numpy.log() or np.log() in Python

What is numpy.log()?

numpy.log() is a mathematical function that helps user to calculate Natural logarithm of x where x belongs to imput array elements. The natural logarithm log is an inverse of exponential function log(exp(x)) = x. It is a logarithm of base ‘e’. It calculates the mathematical log of any number or array.

Syntax:

Parameters:

3 parameters of np.log are x, out and where.

out and where are rarely used.

x provide input to function. This parameter accepts array like objects as input. Like it will also accept Python list as input.

How to use np.log() in Python?

First we have to import numpy module using the command.

import numpy as np

Now the numpy module is imported.

np.log with a single number:

We will try to apply log on numbers and on mathematical constant e, Euler’s number.

import numpy as np
np.log(2)
import numpy as np
print(np.e)
np.log(np.e)

What will happen if we use log on 0?

Lets see what happens

import numpy as np
print(np.log(0))

Calculating log with base 2 and base 10:

import numpy as np
print(np.log2(8))
import numpy as np
print(np.log2(30))
import numpy as np
print(np.log10(30))
import numpy as np
print(np.log10(100))

Calculating log on 1D array:

To calculate the logarithm of a 1D array:

import numpy as np
np.log([3,4,5,7])
import numpy as np
arr1=np.array([1,3,5,5**3])
print(arr1)
arr2=np.log(arr1)
print(arr2)
arr3=np.log2(arr1)
print(arr3)
arr4=np.log10(arr1)
print(arr4)

Calculating log on 2D array:

To calculate the logarithm of a 2D array:

import numpy as np
nparray = np.arange(1,10).reshape(3,3)
print("Original np 2D Array:\n",nparray)
print("\n Logarithmic value of 2D np array elements:\n",np.log(nparray))

Plotting log using matplotlib:

Let’s try plotting a graph for the logarithmic function using matplotlib. If you are not familiar with matplotlib, we have a separate article on this check here.

import numpy as np
import matplotlib.pyplot as plt
arr1 = [3,2.1,2.2,3.2,3.3]
result1=np.log(arr1)
result2=np.log2(arr1)
result3=np.log10(arr1)
plt.plot(arr1,arr1, color='green', marker="") plt.plot(result1,arr1, color='red', marker="o") plt.plot(result2,arr1, color='blue', marker="")
plt.plot(result3,arr1, color='black', marker="o")
plt.show()

Global and local variable in Python

Python Variable:

               A variable is a container for storing data. It allows you to label and store data. Variables can store strings, numbers, lists or any other data type.

Example:

name = ‘Thomas’

name is the variable which stores the string data type.

When we write two values to the same variable, it overwrites the most recent value. For example

a = 10

a = 20

When you call a it will show 20, as it is the most recent value.

In Python Programming we will see two variables:

Global and local.

Global Variable:

               In Python, a variable which is created outside of the function is called global variable. It has scope throughout the program i.e., inside or outside the function. We can access both inside and outside of the program. It is often declared at the top of the program.

Trying to change value of Global variable inside a function:

Unbound local error is raised.

Local Variable:

               In Python, a variable inside a function is called local variable. It has scope only within the function in which is defined.

Accessing local variable outside scope:

Name error is raised.

Global Keyword:

               Global Keyword is used only when we want to change or modify the global variable outside its current scope. It is used to make change in gloabal variable in local context.

Syntax:

def f1( ):

global variable

The fundamental rules of the ‘global’ keyword are as follows:
  • When you create a variable inside the function, it is in a local context by default
  • When you create or define a variable outside the function, by default it is a global context, there’s no need for a global keyword here
  • Global keywords can be used to read or modify the global variable inside the function
  • Using a global keyword outside of the function has no use or makes no effect.
Using both Global and local variables:
Global and local variable with same name:

*args and **kwargs in Python

Why *args and **kwargs?

A function is useful in generating a reusable code. We call a function using specific arguments called function argument in Python.

Now let us try to understand its need.

For example in a simple program of multiply two numbers what will happen if we have passed three arguments instead of two.

Type error is thrown. In cases where we don’t know the no. of arguments we can use *args and **kwargs to pass variable no. of arguments.

Special Symbols for passing arguments:

Two special symbols are used in Python for passing variable no. of arguments in Python.

  • *args (Non-keyword argument)
  • **kwargs (keyword argument)

We use the * notation when we don’t know the no. of arguments.

*args:

               *args is used to pass a variable number of non-keyword arguments. * indicates the variable length of arguments. * before args denotes that arguments are passed as tuple. These arguments make tuple inside the function with same name as parameter excluding *

Using *args:

**kwargs:

               **kwargs allows us to pass a variable number of keyword arguments. ** before kwargs denotes that arguments are passed as dictionary. These arguments make dictionary inside the function with same name as parameter excluding **.

Using **kwargs:

Using both *args and **kwargs:

Argument order:

Argument order is important in function declarations, like in above example always

  • Regular arguments
  • *args
  • **kwargs

What will happen if we change order? let’s see!

Syntax error is raised.

Unpacking operator * :

The unpacking operator * is used to unpack and merge list, tuples.

Unpacking using *

Packing or merging list and tuples using *

Unpacking operator **

** is used to unpack and merge dictionaries.

Points to Note:

  • *args and **kwargs are special keywords for accepting variable number of arguments.
  • *args passes variable number of non-keyword arguments and performs operations on list, tuples
  • **kwargs passes variable number of keyword arguments and performs operations on dictionaries.
  • *args and **kwargs makes the function flexible.