PyBrain - Importing Data For Datasets
Last Updated :
21 Feb, 2022
In this article, we will learn how to import data for datasets in PyBrain.
Datasets are the data to be given to test, validate and train on networks. The type of dataset to be used depends on the tasks that we are going to do with Machine Learning. The most commonly used datasets that Pybrain supports are SupervisedDataSet and ClassificationDataSet. As their name suggests ClassificationDataSet is used in the classification problems and the SupervisedDataSet for supervised learning tasks.
Method 1: Importing Data For Datasets Using CSV Files
This is the simplest method of importing any dataset from a CSV file. For this we will be using the Panda, so importing the Pandas library is a must.
Syntax: pd.read_csv('path of the csv file')
Consider the CSV file we want to import is price.csv.
Python3
import pandas as pd
print('Read data...')
# enter the complete path of the csv file
df = pd.read_csv('../price.csv',header=0).head(1000)
data = df.values
Method 2: Importing Data For DatasetsUsing Sklearn
There are many premade datasets available on the Sklearn library. Three main kinds of dataset interfaces can be used to get datasets depending on the desired type of dataset.
- The dataset loaders - They can be used to load small standard datasets, described in the Toy datasets section.
Example 1: loading Iris dataset
Python3
from pybrain.datasets import ClassificationDataSet
from sklearn import datasets
nums = datasets.load_iris()
x, y = nums.data, nums.target
ds = ClassificationDataSet(4, 1, nb_classes=3)
for j in range(len(x)):
ds.addSample(x[j], y[j])
ds
Output:
<pybrain.datasets.classification.ClassificationDataSet at 0x7f7004812a50>
Example 2: Loading digit Dataset
Python3
from sklearn import datasets
from pybrain.datasets import ClassificationDataSet
digits = datasets.load_digits()
X, y = digits.data, digits.target
ds = ClassificationDataSet(64, 1, nb_classes=10)
for i in range(len(X)):
ds.addSample(ravel(X[i]), y[i])
Output:
<pybrain.datasets.classification.ClassificationDataSet at 0x5d4054612v80>
- The dataset fetchers - They can be used to download and load larger datasets
Example:
Python3
import sklearn as sk
sk.datasets.fetch_california_housing
Output:
<function sklearn.datasets._california_housing.fetch_california_housing>
- The dataset generation functions - They can be used to generate controlled synthetic datasets, described in the Generated datasets section. These functions return a tuple (X, y) consisting of a n_samples * n_features NumPy array X and an array of length n_samples containing the targets y.
Example:
Python3
from sklearn.datasets import make_moon
from matplotlib import pyplot as plt
from matplotlib import style
X, y = make_moons(n_samples = 1000, noise = 0.1)
plt.scatter(X[:, 0], X[:, 1], s = 40, color ='g')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
plt.clf()
Output:
Similar Reads
How to import datasets using sklearn in PyBrain In this article, we will discuss how to import datasets using sklearn in PyBrain Dataset: A Dataset is defined as the set of data that is can be used to test, validate, and train on networks. On comparing it with arrays, a dataset is considered more flexible and easy to use. A dataset resembles a 2-
2 min read
How to create a dataset using PyBrain? In this article, we are going to see how to create a dataset using PyBrain. Dataset Datasets are the data that are specifically given to test, validate and train on networks. Instead of troubling with arrays, PyBrain provides us with a more flexible data structure using which handling data can be qu
3 min read
Loading Different Data Files in Python We are given different Data files and our task is to load all of them using Python. In this article, we will discuss how to load different data files in Python. Loading Different Data Files in PythonBelow, are the example of Loading Different Data Files in Python: Loading Plain Text Files Loading Im
2 min read
Data profiling in Pandas using Python Pandas is one of the most popular Python library mainly used for data manipulation and analysis. When we are working with large data, many times we need to perform Exploratory Data Analysis. We need to get the detailed description about different columns available and there relation, null check, dat
1 min read
Using Altair on Data Aggregated from Large Datasets Altair is a powerful and easy-to-use Python library for creating interactive visualizations. It's based on a grammar of graphics, which means we can build complex plots from simple building blocks. When dealing with large datasets, Altair can be particularly handy for aggregating and visualizing dat
7 min read