preloader


Contact Us

Python Data File Formats: How to Read CSV, JSON, and Excel Files

Python Data File Formats: How to Read CSV, JSON, and Excel Files Image

In all fields, including business, science, education, and research, data is a crucial component. The data are not in the same format. We work with many forms of information, even within a single domain. In this blog, learn how to read common file formats in Python, including CSV, JSON, and Excel files.

Python supports built-in functions for creating, writing, and retrieving files. Normal text files and binary files, which are written in a binary language with 0s and 1s, can both be handled by Python.

  • Text files: In these types of files, each line of text is ended by a unique character called EOL (End of Line), which in Python is equivalent to the new line character ('n').
  • Binary files: A line does not have a terminator in a binary file, and the data is saved after being translated into machine-readable binary language.

Reading Text Files in Python

One of the most used file types for storing data is text files. It's quite simple to read data from text files using Python. The open() method in Python can be used to read files and accepts two parameters: the file path and the file access mode. The file access mode for reading a text file is "r." Below, I've listed the additional access methods:

  • ‘w’ – writing to a file
  • ‘r+’ – read to a file
  • ‘a’ – appending to an existing file
  • ‘a+’ – add a file extension after reading

Three functions are available in Python to read data from a text file:

1. read(): This function returns the read bytes as a string. reads n bytes; if no value for n is given, reads the whole file.

File_object.read([n])

2..readline(): Returns a string after reading a line from the file. reads a maximum of n bytes for the provided n. Even if n is greater than the length of the line, it only reads one line.

File_object.readline([n])

3. readlines(): Reads all the lines and returns them as each line's string element in a list.

   File_object.readlines()

How do these functions differ when reading a text file? Let's find out.

# read text file

with open(r'./Importing files/Filename.txt','r') as f:

print(f.read())

The read() function imported all the data in the file in the correct structured form

Reading CSV Files in Python

The most frequent type of file is a CSV (Comma Separated Value) file. Each row in a CSV file is a data record, and the values in these files are separated by the character ",". These are effective for moving data between applications, However, if the values have commas in them, CSV may encounter issues. This can be avoided by using various delimiters, such as "t" or ";" to separate the information in the file. These can also be imported using the read csv() method when reading a TSV (Tab Separated Values) file by setting the delimiter in the argument value as shown below:

import pandas as pd

df = pd.read_csv(r'./Importing files/Filename.txt',delimiter='\t')

df

Reading Excel Files in Python

The vast majority of you are probably pretty familiar with Excel files and the reasons they are so frequently used to store tabular data. So I'll get into coding and use Pandas to import an Excel file into Python.

To read Excel files, Pandas has a beneficial function named read excel():

# read Excel file into a DataFrame

df = pd.read_excel(r'./Importing files/Filename.xlsx')

# print values

df

To do this, we can use the ExcelFile() method in the Pandas library to display the names of each sheet in the file:

# read Excel sheets in pandas

xl = pd.ExcelFile(r'./Importing files/Filename.xlsx')

# print sheet name

xl.sheet_names

After doing that, we can easily read data from any sheet we wish by providing its name in the sheet_name parameter in the read_excel() function:

# read Europe sheet

df = pd.read_excel(r'./Importing files/Filename.xlsx',sheet_name='Europe')

df

Working with JSON Files in Python

For storing and exchanging data, JSON (JavaScript Object Notation) files are small and readable by individuals. These files are easy for computers to understand  since they are built in the JavaScript programming language.

Similar to how a Python dictionary holds data, JSON files also contain data. Their main advantage is that they can be used with any programming language, including Python, C, and even Java.

{

"student": 

[

{ "id":"01", "name": "Tom", "lastname": "Price"},

  {"id":"02", "name": "Nick", "lastname": "Thameson" } 

}

To read JSON files, Python has the "JSON" module. JSON files can be read in the same way as plain text files. The json.load() function, which returns a JSON dictionary, is used in this instance instead of the read function. After that, you can quickly convert it to a Panda data frame using the following function:

import json

# open json file

with open('./Importing files/sample_json.json','r') as file:

     data = json.load(file)

# json dictionary

print(type(data))

# loading into a DataFrame

df_json = pd.DataFrame(data)

df_json

You can even load the JSON file directly into a data frame using the pandas.read_json() function, as shown below:

# reading directly into a DataFrame using pd.read_json()

path = './Importing files/sample_json.json'

df = pd.read_json(path)

df

Reading Data from Pickle Files in Python

Python objects are processed and stored in pickle files. This implies that before being stored on the disc, objects like lists, sets, tuples, etc. are transformed into a character stream. This makes it possible for you to use the things later on. When your machine learning model has been developed and you want to use it for predictions in the future, these are especially helpful.

Therefore, if the files were serialized before being saved, you must de-serialize them before using them in your Python scripts. The pickle module's pickle.load() function is used for this. However, to read the binary file while using Python's open() method, you must specify the 'rb' parameter.

import pickle

with open('./Importing files/sample_pickle.pkl','rb') as file:

     data = pickle.load(file)

# pickle data

print(type(data))

df_pkl = pd.DataFrame(data)

df_pkl

Conclusion

In this article, we have covered Python data file formats and how to read CSV, JSON,  and Excel files. Please check the below link to learn more about Python MySQL - Database Connection

 

Share

Top Stories