Data set operations
Open, Needs TriagePublic

Description

Data manipulation

  • Data (set) operations
    • option: change current/add new dataset
    • cut selected x/y-range/outside selected x/y-range
    • geometric transform(x/y): translation/scaling (factor/to range)/rotation(?) (see xmgrace)
    • evaluate expression on data set (see xmgrace)
    • sort data
    • Detrend (remove constant/linear fit, see scipy.detrend)
    • Baseline removal (how to define baseline?)
  • Spike removal (killspikes.m)
  • Feature extraction (see xmgrace)

Data cleaning

  • https://github.com/rhiever/datacleaner
    • drop any row with a missing value
    • replace missing values with the mode (for categorical variables) or median (for continuous variables) on a column-by-column basis
    • encode non-numerical variables (e.g., categorical variables with strings) with numerical equivalents
  • https://github.com/NathanEpstein/Dora
    • impute the missing values (using the average of each column)
    • scale the values of the input variables (center to mean and scale to unit variance)
    • extract an ordinal feature through one-hot encoding
  • Introduction to data cleaning with R
sgerlach updated the task description. (Show Details)May 4 2018, 10:49 PM
sgerlach claimed this task.
asemke updated the task description. (Show Details)Sep 14 2018, 11:39 AM
asemke moved this task from Current Release to Backlog on the LabPlot board.