java.lang.Object

de.edux.data.provider.DataProcessor

Alle implementierten Schnittstellen:: Dataloader, DataPostProcessor, Dataset

public class DataProcessor extends Object implements DataPostProcessor, Dataset, Dataloader

Konstruktorübersicht

Konstruktoren

Konstruktor

Beschreibung

DataProcessor(IDataReader dataReader)
Methodenübersicht

Modifizierer und Typ

Methode

Beschreibung

Map<String,Integer>

getClassMap()

String[]

getColumnDataOf(int columnIndex)

List<String[]>

getDataset()

Retrieves the processed dataset as a list of string arrays.

double[][]

getInputs(List<String[]> dataset, int[] inputColumns)

double[][]

getTargets(List<String[]> dataset, int targetColumn)

double[][]

getTestFeatures(int[] inputColumns)

double[][]

getTestLabels(int targetColumn)

double[][]

getTrainFeatures(int[] inputColumns)

double[][]

getTrainLabels(int targetColumn)

DataPostProcessor

imputation(int columnIndex, ImputationStrategy imputationStrategy)

Performs imputation on missing values in a specified column index using the provided imputation strategy.

DataProcessor

loadDataSetFromCSV(File csvFile, char csvSeparator, boolean skipHead, int[] inputColumns, int targetColumn)

Loads a dataset from the specified CSV file, processes it, and returns a DataProcessor that is ready to be used for further operations such as data manipulation or analysis.

DataPostProcessor

normalize()

Normalizes the dataset.

void

performListWiseDeletion()

Performs list-wise deletion on the dataset.

DataPostProcessor

shuffle()

Shuffles the dataset randomly.

DataProcessor

split(double splitRatio)

Splits the dataset into two separate datasets according to the specified split ratio.

Von Klasse geerbte Methoden java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Konstruktordetails
- DataProcessor
  
  public DataProcessor(IDataReader dataReader)
Methodendetails
- split
  
  public DataProcessor split(double splitRatio)
  
  Beschreibung aus Schnittstelle kopiert: DataPostProcessor
  
  Splits the dataset into two separate datasets according to the specified split ratio. The split ratio determines the proportion of data to be used for the first dataset (e.g., training set).
  
  Angegeben von:
  
  split in Schnittstelle DataPostProcessor
  
  Parameter:
  
  splitRatio - the ratio for splitting the dataset, where 0 Ungültige Eingabe: "<" splitRatio Ungültige Eingabe: "<" 1
  
  Gibt zurück:
  
  a DataProcessor instance containing the first portion of the dataset according to the split ratio
- loadDataSetFromCSV
  
  public DataProcessor loadDataSetFromCSV(File csvFile, char csvSeparator, boolean skipHead, int[] inputColumns, int targetColumn)
  
  Beschreibung aus Schnittstelle kopiert: Dataloader
  
  Loads a dataset from the specified CSV file, processes it, and returns a DataProcessor that is ready to be used for further operations such as data manipulation or analysis.
  
  Angegeben von:
  
  loadDataSetFromCSV in Schnittstelle Dataloader
  
  Parameter:
  
  csvFile - the CSV file to load the data from
  
  csvSeparator - the character that separates values in a row in the CSV file
  
  skipHead - a boolean indicating whether to skip the header row (true) or not (false)
  
  inputColumns - an array of indexes indicating which columns to include as input features
  
  targetColumn - the index of the column to use as the output label or target for predictions
  
  Gibt zurück:
  
  a DataProcessor object that contains the processed data
- normalize
  
  public DataPostProcessor normalize()
  
  Beschreibung aus Schnittstelle kopiert: DataPostProcessor
  
  Normalizes the dataset. This typically involves scaling the values of numeric attributes so that they share a common scale, often between 0 and 1, without distorting differences in the ranges of values.
  
  Angegeben von:
  
  normalize in Schnittstelle DataPostProcessor
  
  Gibt zurück:
  
  the DataPostProcessor instance with normalized data for method chaining
- shuffle
  
  public DataPostProcessor shuffle()
  
  Beschreibung aus Schnittstelle kopiert: DataPostProcessor
  
  Shuffles the dataset randomly. This is usually done to ensure that the data does not carry any inherent bias in the order it was collected or presented.
  
  Angegeben von:
  
  shuffle in Schnittstelle DataPostProcessor
  
  Gibt zurück:
  
  the DataPostProcessor instance with shuffled data for method chaining
- getDataset
  
  public List<String[]> getDataset()
  
  Beschreibung aus Schnittstelle kopiert: DataPostProcessor
  
  Retrieves the processed dataset as a list of string arrays. Each string array represents a row in the dataset.
  
  Angegeben von:
  
  getDataset in Schnittstelle DataPostProcessor
  
  Gibt zurück:
  
  a list of string arrays representing the dataset
- getInputs
  
  public double[][] getInputs(List<String[]> dataset, int[] inputColumns)
  
  Angegeben von:
  
  getInputs in Schnittstelle Dataset
- getTargets
  
  public double[][] getTargets(List<String[]> dataset, int targetColumn)
  
  Angegeben von:
  
  getTargets in Schnittstelle Dataset
- getClassMap
  
  public Map<String,Integer> getClassMap()
  
  Angegeben von:
  
  getClassMap in Schnittstelle Dataset
- getColumnDataOf
  
  public String[] getColumnDataOf(int columnIndex)
  
  Angegeben von:
  
  getColumnDataOf in Schnittstelle Dataset
- imputation
  
  public DataPostProcessor imputation(int columnIndex, ImputationStrategy imputationStrategy)
  
  Beschreibung aus Schnittstelle kopiert: DataPostProcessor
  
  Performs imputation on missing values in a specified column index using the provided imputation strategy.
  
  Angegeben von:
  
  imputation in Schnittstelle DataPostProcessor
  
  Parameter:
  
  columnIndex - the index of the column to apply imputation
  
  imputationStrategy - the strategy to use for imputing missing values
  
  Gibt zurück:
  
  the DataPostProcessor instance with imputed data for method chaining
- performListWiseDeletion
  
  public void performListWiseDeletion()
  
  Beschreibung aus Schnittstelle kopiert: DataPostProcessor
  
  Performs list-wise deletion on the dataset. This involves removing any rows with missing values to ensure the dataset is complete. This method modifies the dataset in place and does not return a value.
  
  Angegeben von:
  
  performListWiseDeletion in Schnittstelle DataPostProcessor
- getTrainFeatures
  
  public double[][] getTrainFeatures(int[] inputColumns)
  
  Angegeben von:
  
  getTrainFeatures in Schnittstelle Dataset
- getTrainLabels
  
  public double[][] getTrainLabels(int targetColumn)
  
  Angegeben von:
  
  getTrainLabels in Schnittstelle Dataset
- getTestFeatures
  
  public double[][] getTestFeatures(int[] inputColumns)
  
  Angegeben von:
  
  getTestFeatures in Schnittstelle Dataset
- getTestLabels
  
  public double[][] getTestLabels(int targetColumn)
  
  Angegeben von:
  
  getTestLabels in Schnittstelle Dataset

Klasse DataProcessor

Konstruktorübersicht

Methodenübersicht

Von Klasse geerbte Methoden java.lang.Object

Konstruktordetails

DataProcessor

Methodendetails

split

loadDataSetFromCSV

normalize

shuffle

getDataset

getInputs

getTargets

getClassMap

getColumnDataOf

imputation

performListWiseDeletion

getTrainFeatures

getTrainLabels

getTestFeatures

getTestLabels