Top Pandas Packages And Libraries
Pandas is a awesome package in Python. It is especially used for data science applications. Pandas dataframe data structure is very effective in doing data manipulations. Pandas dataframes support operations such data summarization, aggregation, visualization etc. Check out following resources to learn more about pandas…
https://www.nbshare.io/notebook/823959095/Pivot-Tables-In-Python-Pandas/
https://www.nbshare.io/notebook/703251111/Most-Frequently-Asked-Questions-Python-Pandas-Part1/
https://www.nbshare.io/notebook/17251835/Summarising-Aggregating-and-Grouping-data-in-Python-Pandas/
Pandas in itself is a great library. It has excellent community support. There are lot of packages which have been developed by Python community. In this post, I will go over the list of Pandas packages which everyone should have in his/her tool box.
Pandas-Profiling
Everyone knows df.describe() function which basically summarizes the data for each column and provide some basic stats. Pandas-profiling has taken df.describe() to the next level. It provides following features…
- Detect the types of columns in a dataframe.
- Essentials: type, unique values, missing values
- Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
- Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
- Most frequent values
- Histogram
- Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
- Missing values matrix, count, heatmap and dendrogram of missing values
- Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
- File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.
Pandas-Datareader
This package does following…
- Remote Data Access - extract stocks data from various online resources
- Caching Queries - saves memory and runs code faster
Qgrid
- This one is great utility if you use Jupyter notebooks. Qgrid embeds an interactive widge in the notebook. Using this widget, one can do interactive scrolling, sorting and filtering on the dataframe.
Feather
- Feather is a dataformat which is especially designed to read and write dataframes faster. I would highly recommend it, if you are dealing with large dataframes.