Data manipulation
- Data (set) operations
- option: change current/add new dataset
- cut selected x/y-range/outside selected x/y-range
- geometric transform(x/y): translation/scaling (factor/to range)/rotation(?) (see xmgrace)
- evaluate expression on data set (see xmgrace)
- sort data
- Detrend (remove constant/linear fit, see scipy.detrend)
- Baseline removal (how to define baseline?)
- Spike removal (killspikes.m)
- Feature extraction (see xmgrace)
Data cleaning
- https://github.com/rhiever/datacleaner
- drop any row with a missing value
- replace missing values with the mode (for categorical variables) or median (for continuous variables) on a column-by-column basis
- encode non-numerical variables (e.g., categorical variables with strings) with numerical equivalents
- https://github.com/NathanEpstein/Dora
- impute the missing values (using the average of each column)
- scale the values of the input variables (center to mean and scale to unit variance)
- extract an ordinal feature through one-hot encoding
- Introduction to data cleaning with R