Found 10000 packages in 0.02 seconds
Tools for Cleaning Rectangular Data
A dependency-free collection of simple functions for cleaning rectangular data. This package allows to detect, count and replace values or discard rows/columns using a predicate function. In addition, it provides tools to check conditions and return informative error messages.
Survey Data Cleaning, Weighting and Analysis
Provides utilities for cleaning survey data, computing weights, and performing descriptive statistical analysis. Methods follow Lohr (2019, ISBN:978-0367272454) "Sampling: Design and Analysis" and Lumley (2010)
Clean Data Frames
Provides a friendly interface for modifying data frames with a sequence of piped commands built upon the 'tidyverse' Wickham et al., (2019)
'LLM'-Assisted Data Cleaning with Multi-Provider Support
Detects and suggests fixes for semantic inconsistencies in data
frames by calling large language models (LLMs) through a unified,
provider-agnostic interface. Supported providers include 'OpenAI'
('GPT-4o', 'GPT-4o-mini'), 'Anthropic' ('Claude'), 'Google' ('Gemini'),
'Groq' (free-tier 'LLaMA' and 'Mixtral'), and local 'Ollama' models.
The package identifies issues that rule-based tools cannot detect:
abbreviation variants, typographic errors, case inconsistencies, and
malformed values. Results are returned as tidy data frames with column,
row index, detected value, issue type, suggested fix, and confidence
score. An offline fallback using statistical and fuzzy-matching methods
is provided for use without any API key. Interactive fix application
with human review is supported via 'apply_fixes()'. Methods follow
de Jonge and van der Loo (2013)
< https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf>
and Chaudhuri et al. (2003)
A User-Friendly Biodiversity Data Cleaning App for the Inexperienced R User
Provides features to manage the complete workflow for biodiversity data cleaning. Uploading data, gathering input from users (in order to adjust cleaning procedures), cleaning data and finally, generating various reports and several versions of the data. Facilitates user-level data cleaning, designed for the inexperienced R user. T Gueta et al (2018)
Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005)
Clean and Standardize Epidemiological Data
Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.
Cleaning Text Data with an AI Assistant
Provides functions to clean and standardize messy data, including textual categories and free-text addresses, using Large Language Models. The package corrects typos, expands abbreviations, and maps inconsistent entries to standardized values. Ideal for Bioinformatics, business, and general data cleaning tasks.
Turn Clean Data into Messy Data
Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.
Wrapper Functions Collection Used in Data Pipelines
The goal of this package is to provide wrapper functions in the data cleaning and cleansing processes. These function helps in messages and interaction with the user, keep track of information in pipelines, help in the wrangling, munging, assessment and visualization of data frame-like material.