Package de.edux.data.provider
Klasse DataProcessor
java.lang.Object
de.edux.data.provider.DataProcessor
- Alle implementierten Schnittstellen:
Dataloader,DataPostProcessor,Dataset
-
Konstruktorübersicht
Konstruktoren -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungString[]getColumnDataOf(int columnIndex) Retrieves the processed dataset as a list of string arrays.double[][]double[][]getTargets(List<String[]> dataset, int targetColumn) double[][]getTestFeatures(int[] inputColumns) double[][]getTestLabels(int targetColumn) double[][]getTrainFeatures(int[] inputColumns) double[][]getTrainLabels(int targetColumn) imputation(int columnIndex, ImputationStrategy imputationStrategy) Performs imputation on missing values in a specified column index using the provided imputation strategy.loadDataSetFromCSV(File csvFile, char csvSeparator, boolean skipHead, int[] inputColumns, int targetColumn) Loads a dataset from the specified CSV file, processes it, and returns aDataProcessorthat is ready to be used for further operations such as data manipulation or analysis.Normalizes the dataset.voidPerforms list-wise deletion on the dataset.shuffle()Shuffles the dataset randomly.split(double splitRatio) Splits the dataset into two separate datasets according to the specified split ratio.
-
Konstruktordetails
-
DataProcessor
-
-
Methodendetails
-
split
Beschreibung aus Schnittstelle kopiert:DataPostProcessorSplits the dataset into two separate datasets according to the specified split ratio. The split ratio determines the proportion of data to be used for the first dataset (e.g., training set).- Angegeben von:
splitin SchnittstelleDataPostProcessor- Parameter:
splitRatio- the ratio for splitting the dataset, where 0 Ungültige Eingabe: "<" splitRatio Ungültige Eingabe: "<" 1- Gibt zurück:
- a
DataProcessorinstance containing the first portion of the dataset according to the split ratio
-
loadDataSetFromCSV
public DataProcessor loadDataSetFromCSV(File csvFile, char csvSeparator, boolean skipHead, int[] inputColumns, int targetColumn) Beschreibung aus Schnittstelle kopiert:DataloaderLoads a dataset from the specified CSV file, processes it, and returns aDataProcessorthat is ready to be used for further operations such as data manipulation or analysis.- Angegeben von:
loadDataSetFromCSVin SchnittstelleDataloader- Parameter:
csvFile- the CSV file to load the data fromcsvSeparator- the character that separates values in a row in the CSV fileskipHead- a boolean indicating whether to skip the header row (true) or not (false)inputColumns- an array of indexes indicating which columns to include as input featurestargetColumn- the index of the column to use as the output label or target for predictions- Gibt zurück:
- a
DataProcessorobject that contains the processed data
-
normalize
Beschreibung aus Schnittstelle kopiert:DataPostProcessorNormalizes the dataset. This typically involves scaling the values of numeric attributes so that they share a common scale, often between 0 and 1, without distorting differences in the ranges of values.- Angegeben von:
normalizein SchnittstelleDataPostProcessor- Gibt zurück:
- the
DataPostProcessorinstance with normalized data for method chaining
-
shuffle
Beschreibung aus Schnittstelle kopiert:DataPostProcessorShuffles the dataset randomly. This is usually done to ensure that the data does not carry any inherent bias in the order it was collected or presented.- Angegeben von:
shufflein SchnittstelleDataPostProcessor- Gibt zurück:
- the
DataPostProcessorinstance with shuffled data for method chaining
-
getDataset
Beschreibung aus Schnittstelle kopiert:DataPostProcessorRetrieves the processed dataset as a list of string arrays. Each string array represents a row in the dataset.- Angegeben von:
getDatasetin SchnittstelleDataPostProcessor- Gibt zurück:
- a list of string arrays representing the dataset
-
getInputs
-
getTargets
- Angegeben von:
getTargetsin SchnittstelleDataset
-
getClassMap
- Angegeben von:
getClassMapin SchnittstelleDataset
-
getColumnDataOf
- Angegeben von:
getColumnDataOfin SchnittstelleDataset
-
imputation
Beschreibung aus Schnittstelle kopiert:DataPostProcessorPerforms imputation on missing values in a specified column index using the provided imputation strategy.- Angegeben von:
imputationin SchnittstelleDataPostProcessor- Parameter:
columnIndex- the index of the column to apply imputationimputationStrategy- the strategy to use for imputing missing values- Gibt zurück:
- the
DataPostProcessorinstance with imputed data for method chaining
-
performListWiseDeletion
public void performListWiseDeletion()Beschreibung aus Schnittstelle kopiert:DataPostProcessorPerforms list-wise deletion on the dataset. This involves removing any rows with missing values to ensure the dataset is complete. This method modifies the dataset in place and does not return a value.- Angegeben von:
performListWiseDeletionin SchnittstelleDataPostProcessor
-
getTrainFeatures
public double[][] getTrainFeatures(int[] inputColumns) - Angegeben von:
getTrainFeaturesin SchnittstelleDataset
-
getTrainLabels
public double[][] getTrainLabels(int targetColumn) - Angegeben von:
getTrainLabelsin SchnittstelleDataset
-
getTestFeatures
public double[][] getTestFeatures(int[] inputColumns) - Angegeben von:
getTestFeaturesin SchnittstelleDataset
-
getTestLabels
public double[][] getTestLabels(int targetColumn) - Angegeben von:
getTestLabelsin SchnittstelleDataset
-