Package de.edux.data.provider
Klasse DataProcessor
java.lang.Object
de.edux.data.provider.DataProcessor
- Alle implementierten Schnittstellen:
Dataloader
,DataPostProcessor
,Dataset
-
Konstruktorübersicht
-
Methodenübersicht
Modifizierer und TypMethodeBeschreibungString[]
getColumnDataOf
(int columnIndex) Retrieves the processed dataset as a list of string arrays.double[][]
double[][]
getTargets
(List<String[]> dataset, int targetColumn) double[][]
getTestFeatures
(int[] inputColumns) double[][]
getTestLabels
(int targetColumn) double[][]
getTrainFeatures
(int[] inputColumns) double[][]
getTrainLabels
(int targetColumn) imputation
(int columnIndex, ImputationStrategy imputationStrategy) Performs imputation on missing values in a specified column index using the provided imputation strategy.loadDataSetFromCSV
(File csvFile, char csvSeparator, boolean skipHead, int[] inputColumns, int targetColumn) Loads a dataset from the specified CSV file, processes it, and returns aDataProcessor
that is ready to be used for further operations such as data manipulation or analysis.Normalizes the dataset.void
Performs list-wise deletion on the dataset.shuffle()
Shuffles the dataset randomly.split
(double splitRatio) Splits the dataset into two separate datasets according to the specified split ratio.
-
Konstruktordetails
-
DataProcessor
-
-
Methodendetails
-
split
Beschreibung aus Schnittstelle kopiert:DataPostProcessor
Splits the dataset into two separate datasets according to the specified split ratio. The split ratio determines the proportion of data to be used for the first dataset (e.g., training set).- Angegeben von:
split
in SchnittstelleDataPostProcessor
- Parameter:
splitRatio
- the ratio for splitting the dataset, where 0 Ungültige Eingabe: "<" splitRatio Ungültige Eingabe: "<" 1- Gibt zurück:
- a
DataProcessor
instance containing the first portion of the dataset according to the split ratio
-
loadDataSetFromCSV
public DataProcessor loadDataSetFromCSV(File csvFile, char csvSeparator, boolean skipHead, int[] inputColumns, int targetColumn) Beschreibung aus Schnittstelle kopiert:Dataloader
Loads a dataset from the specified CSV file, processes it, and returns aDataProcessor
that is ready to be used for further operations such as data manipulation or analysis.- Angegeben von:
loadDataSetFromCSV
in SchnittstelleDataloader
- Parameter:
csvFile
- the CSV file to load the data fromcsvSeparator
- the character that separates values in a row in the CSV fileskipHead
- a boolean indicating whether to skip the header row (true) or not (false)inputColumns
- an array of indexes indicating which columns to include as input featurestargetColumn
- the index of the column to use as the output label or target for predictions- Gibt zurück:
- a
DataProcessor
object that contains the processed data
-
normalize
Beschreibung aus Schnittstelle kopiert:DataPostProcessor
Normalizes the dataset. This typically involves scaling the values of numeric attributes so that they share a common scale, often between 0 and 1, without distorting differences in the ranges of values.- Angegeben von:
normalize
in SchnittstelleDataPostProcessor
- Gibt zurück:
- the
DataPostProcessor
instance with normalized data for method chaining
-
shuffle
Beschreibung aus Schnittstelle kopiert:DataPostProcessor
Shuffles the dataset randomly. This is usually done to ensure that the data does not carry any inherent bias in the order it was collected or presented.- Angegeben von:
shuffle
in SchnittstelleDataPostProcessor
- Gibt zurück:
- the
DataPostProcessor
instance with shuffled data for method chaining
-
getDataset
Beschreibung aus Schnittstelle kopiert:DataPostProcessor
Retrieves the processed dataset as a list of string arrays. Each string array represents a row in the dataset.- Angegeben von:
getDataset
in SchnittstelleDataPostProcessor
- Gibt zurück:
- a list of string arrays representing the dataset
-
getInputs
-
getTargets
- Angegeben von:
getTargets
in SchnittstelleDataset
-
getClassMap
- Angegeben von:
getClassMap
in SchnittstelleDataset
-
getColumnDataOf
- Angegeben von:
getColumnDataOf
in SchnittstelleDataset
-
imputation
Beschreibung aus Schnittstelle kopiert:DataPostProcessor
Performs imputation on missing values in a specified column index using the provided imputation strategy.- Angegeben von:
imputation
in SchnittstelleDataPostProcessor
- Parameter:
columnIndex
- the index of the column to apply imputationimputationStrategy
- the strategy to use for imputing missing values- Gibt zurück:
- the
DataPostProcessor
instance with imputed data for method chaining
-
performListWiseDeletion
public void performListWiseDeletion()Beschreibung aus Schnittstelle kopiert:DataPostProcessor
Performs list-wise deletion on the dataset. This involves removing any rows with missing values to ensure the dataset is complete. This method modifies the dataset in place and does not return a value.- Angegeben von:
performListWiseDeletion
in SchnittstelleDataPostProcessor
-
getTrainFeatures
public double[][] getTrainFeatures(int[] inputColumns) - Angegeben von:
getTrainFeatures
in SchnittstelleDataset
-
getTrainLabels
public double[][] getTrainLabels(int targetColumn) - Angegeben von:
getTrainLabels
in SchnittstelleDataset
-
getTestFeatures
public double[][] getTestFeatures(int[] inputColumns) - Angegeben von:
getTestFeatures
in SchnittstelleDataset
-
getTestLabels
public double[][] getTestLabels(int targetColumn) - Angegeben von:
getTestLabels
in SchnittstelleDataset
-