Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 10000 packages in 0.02 seconds

arkhe — by Nicolas Frerebeau, a year ago

Tools for Cleaning Rectangular Data

A dependency-free collection of simple functions for cleaning rectangular data. This package allows to detect, count and replace values or discard rows/columns using a predicate function. In addition, it provides tools to check conditions and return informative error messages.

SurveyStat — by Muhammad Ali, 3 months ago

Survey Data Cleaning, Weighting and Analysis

Provides utilities for cleaning survey data, computing weights, and performing descriptive statistical analysis. Methods follow Lohr (2019, ISBN:978-0367272454) "Sampling: Design and Analysis" and Lumley (2010) .

framecleaner — by Harrison Tietze, 2 years ago

Clean Data Frames

Provides a friendly interface for modifying data frames with a sequence of piped commands built upon the 'tidyverse' Wickham et al., (2019) . The majority of commands wrap 'dplyr' mutate statements in a convenient way to concisely solve common issues that arise when tidying small to medium data sets. Includes smart defaults and allows flexible selection of columns via 'tidyselect'.

llmclean — by Sadikul Islam, 2 days ago

'LLM'-Assisted Data Cleaning with Multi-Provider Support

Detects and suggests fixes for semantic inconsistencies in data frames by calling large language models (LLMs) through a unified, provider-agnostic interface. Supported providers include 'OpenAI' ('GPT-4o', 'GPT-4o-mini'), 'Anthropic' ('Claude'), 'Google' ('Gemini'), 'Groq' (free-tier 'LLaMA' and 'Mixtral'), and local 'Ollama' models. The package identifies issues that rule-based tools cannot detect: abbreviation variants, typographic errors, case inconsistencies, and malformed values. Results are returned as tidy data frames with column, row index, detected value, issue type, suggested fix, and confidence score. An offline fallback using statistical and fuzzy-matching methods is provided for use without any API key. Interactive fix application with human review is supported via 'apply_fixes()'. Methods follow de Jonge and van der Loo (2013) < https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf> and Chaudhuri et al. (2003) .

bdclean — by Thiloshon Nagarajah, 7 years ago

A User-Friendly Biodiversity Data Cleaning App for the Inexperienced R User

Provides features to manage the complete workflow for biodiversity data cleaning. Uploading data, gathering input from users (in order to adjust cleaning procedures), cleaning data and finally, generating various reports and several versions of the data. Facilitates user-level data cleaning, designed for the inexperienced R user. T Gueta et al (2018) . T Gueta et al (2017) .

WGCNA — by Peter Langfelder, 3 months ago

Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) and Langfelder and Horvath (2008) . Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

cleanepi — by Bubacarr Bah, 6 months ago

Clean and Standardize Epidemiological Data

Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.

emend — by Jiajia Li, a year ago

Cleaning Text Data with an AI Assistant

Provides functions to clean and standardize messy data, including textual categories and free-text addresses, using Large Language Models. The package corrects typos, expands abbreviations, and maps inconsistent entries to standardized values. Ideal for Bioinformatics, business, and general data cleaning tasks.

salty — by Matthew Lincoln, 5 months ago

Turn Clean Data into Messy Data

Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.

fabR — by Guillaume Fabre, 10 months ago

Wrapper Functions Collection Used in Data Pipelines

The goal of this package is to provide wrapper functions in the data cleaning and cleansing processes. These function helps in messages and interaction with the user, keep track of information in pipelines, help in the wrangling, munging, assessment and visualization of data frame-like material.