How to visualize data using matplotlib library

Data Visualization

Data Visualization is a fundamental step involved in the activities of a data scientist. It is a process of projecting complex information in a visual form or context to give more understanding and insights.

It communicates the relationships of the data with images. This is important because it allows trends and patterns to be more easily seen.

It helps turn complex numbers into a story that people can easily understand.

Primary uses include

  • Explore data
  • Communicate data

In this article let’s explore the matplotlib library in python.

matplotlib:

A wide variety of tools exist for visualizing data like matplotlib, pandas visualization, seaborn, ggplot, plotly.

Today we will explore matplotlib

It is the most widely used package for 2D graphics. It is low level and provides lot of freedom.

As matplotlib is not part of the core python library first we need to install it using the command

python -m pip install matplotlib

We will be using matplotlib.pyplot module.

How to plot a line chart using matplotlib?

Line chart is useful in tracking changes over short and long periods of time. When smaller changes exist, line charts are better to understand. It is also useful in comparing changes over same period of time for more than one group.

Syntax:

import matplotlib.pyplot as plt

plt.plot(x_values,y_values)

Here x_values means values to plot on the x-axis and y_values means values to plot on the y-axis.

Example:

Now let us see the number of employees in Company A for the last five years.

import matplotlib.pyplot as plt

# number of employees

emp_count = [325, 400, 530, 605, 710, 600]

year = [2016, 2017, 2018, 2019, 2020, 2021]

# plot a line chart

plt.plot(year, emp_count,'o-g')

#set axis titles

plt.xlabel("Year")

plt.ylabel("Employees")

#set chart title

plt.title("Employee Growth")

plt.show()

Output:

How to chart multiple lines in a single chart?

Now let us consider two companies A and B and its employee growth for the same time period.

import matplotlib.pyplot as plt

# number of employees

emp_count_A = [325, 400, 530, 605, 710, 600]

emp_count_B = [225, 310, 360, 300, 450, 560]

year = [2016, 2017, 2018, 2019, 2020, 2021]

# plot a line chart

plt.plot(year, emp_count_A,'o-b')

plt.plot(year, emp_count_B,'o-r')

#set axis titles

plt.xlabel("Year")

plt.ylabel("Employees")

#set chart title

plt.title("Employee Growth")

#legend

plt.legend(['A', 'B'])

plt.show()

Output:

In this above example both lines share the same axis. So to distinguish them we have used legends.

In the above two examples in addition to the values for x and y axis we have third argument ‘o-g’ which means format string. These are abbreviation for quickly setting line properties.

A format string consists of three parts ‘[marker][line][color]’ with each of them being optional.

To know about bokeh library click here!

Deep Insight into Big Data: Understanding Big Data Basics

What is Big Data?

Big data refers to the large collection of structured, semi-structured and unstructured data mostly collected from Internet connected devices. This represents the massive amount of data an organization is exposed to daily and cannot be managed by traditional database management systems. It led to the evolution of model-driven paradigm to data-driven paradigm. It is always important how an organization use this large data to yield insights that results in better informed decisions. The importance of big data is not in the amount of data but how you use this data.

Characteristics of Big Data

The term big data refers to large data set (Volume), structured, semi-structured and unstructured data (Variety), arriving faster than before (Velocity).  These are 3V.

3V:

Volume:       

            The volume of data stored today is growing exponentially and exploding. Now the data volume has grown from terabytes to zettabytes.

Velocity:

            Represents both the rate at which the data is generated and needs to be handled.

Variety:

            As data collected is not from a single source the variety of data also differs according to the source like emails, web, text or sensors by structured or unstructured data.

Now as big data evolved in due course of time the characteristics also evolved from 3Vs to 6Vs.

6V:

As data grows tremendously in todays internet world, today’s big data is tomorrow’s small data.