Klasse DataProcessor

java.lang.Object
de.edux.data.provider.DataProcessor
Alle implementierten Schnittstellen:
Dataloader, DataPostProcessor, Dataset

public class DataProcessor extends Object implements DataPostProcessor, Dataset, Dataloader
  • Konstruktordetails

    • DataProcessor

      public DataProcessor(IDataReader dataReader)
  • Methodendetails

    • split

      public DataProcessor split(double splitRatio)
      Beschreibung aus Schnittstelle kopiert: DataPostProcessor
      Splits the dataset into two separate datasets according to the specified split ratio. The split ratio determines the proportion of data to be used for the first dataset (e.g., training set).
      Angegeben von:
      split in Schnittstelle DataPostProcessor
      Parameter:
      splitRatio - the ratio for splitting the dataset, where 0 Ungültige Eingabe: "<" splitRatio Ungültige Eingabe: "<" 1
      Gibt zurück:
      a DataProcessor instance containing the first portion of the dataset according to the split ratio
    • loadDataSetFromCSV

      public DataProcessor loadDataSetFromCSV(File csvFile, char csvSeparator, boolean skipHead, int[] inputColumns, int targetColumn)
      Beschreibung aus Schnittstelle kopiert: Dataloader
      Loads a dataset from the specified CSV file, processes it, and returns a DataProcessor that is ready to be used for further operations such as data manipulation or analysis.
      Angegeben von:
      loadDataSetFromCSV in Schnittstelle Dataloader
      Parameter:
      csvFile - the CSV file to load the data from
      csvSeparator - the character that separates values in a row in the CSV file
      skipHead - a boolean indicating whether to skip the header row (true) or not (false)
      inputColumns - an array of indexes indicating which columns to include as input features
      targetColumn - the index of the column to use as the output label or target for predictions
      Gibt zurück:
      a DataProcessor object that contains the processed data
    • normalize

      public DataPostProcessor normalize()
      Beschreibung aus Schnittstelle kopiert: DataPostProcessor
      Normalizes the dataset. This typically involves scaling the values of numeric attributes so that they share a common scale, often between 0 and 1, without distorting differences in the ranges of values.
      Angegeben von:
      normalize in Schnittstelle DataPostProcessor
      Gibt zurück:
      the DataPostProcessor instance with normalized data for method chaining
    • shuffle

      public DataPostProcessor shuffle()
      Beschreibung aus Schnittstelle kopiert: DataPostProcessor
      Shuffles the dataset randomly. This is usually done to ensure that the data does not carry any inherent bias in the order it was collected or presented.
      Angegeben von:
      shuffle in Schnittstelle DataPostProcessor
      Gibt zurück:
      the DataPostProcessor instance with shuffled data for method chaining
    • getDataset

      public List<String[]> getDataset()
      Beschreibung aus Schnittstelle kopiert: DataPostProcessor
      Retrieves the processed dataset as a list of string arrays. Each string array represents a row in the dataset.
      Angegeben von:
      getDataset in Schnittstelle DataPostProcessor
      Gibt zurück:
      a list of string arrays representing the dataset
    • getInputs

      public double[][] getInputs(List<String[]> dataset, int[] inputColumns)
      Angegeben von:
      getInputs in Schnittstelle Dataset
    • getTargets

      public double[][] getTargets(List<String[]> dataset, int targetColumn)
      Angegeben von:
      getTargets in Schnittstelle Dataset
    • getClassMap

      public Map<String,Integer> getClassMap()
      Angegeben von:
      getClassMap in Schnittstelle Dataset
    • getColumnDataOf

      public String[] getColumnDataOf(int columnIndex)
      Angegeben von:
      getColumnDataOf in Schnittstelle Dataset
    • imputation

      public DataPostProcessor imputation(int columnIndex, ImputationStrategy imputationStrategy)
      Beschreibung aus Schnittstelle kopiert: DataPostProcessor
      Performs imputation on missing values in a specified column index using the provided imputation strategy.
      Angegeben von:
      imputation in Schnittstelle DataPostProcessor
      Parameter:
      columnIndex - the index of the column to apply imputation
      imputationStrategy - the strategy to use for imputing missing values
      Gibt zurück:
      the DataPostProcessor instance with imputed data for method chaining
    • performListWiseDeletion

      public void performListWiseDeletion()
      Beschreibung aus Schnittstelle kopiert: DataPostProcessor
      Performs list-wise deletion on the dataset. This involves removing any rows with missing values to ensure the dataset is complete. This method modifies the dataset in place and does not return a value.
      Angegeben von:
      performListWiseDeletion in Schnittstelle DataPostProcessor
    • getTrainFeatures

      public double[][] getTrainFeatures(int[] inputColumns)
      Angegeben von:
      getTrainFeatures in Schnittstelle Dataset
    • getTrainLabels

      public double[][] getTrainLabels(int targetColumn)
      Angegeben von:
      getTrainLabels in Schnittstelle Dataset
    • getTestFeatures

      public double[][] getTestFeatures(int[] inputColumns)
      Angegeben von:
      getTestFeatures in Schnittstelle Dataset
    • getTestLabels

      public double[][] getTestLabels(int targetColumn)
      Angegeben von:
      getTestLabels in Schnittstelle Dataset