a library containing a collection of utility functions designed to filter and process text data based on certain criteria. These functions are useful for various text processing tasks, such as removing unwanted characters, extracting specific information, or cleaning input data.
- Alphabetic Filtering: Easily filter out non-alphabetic characters from text data.
- Numeric Filtering: Quickly remove non-numeric characters from text strings.
- Alphanumeric Filtering: Filter text to retain only alphanumeric characters, excluding special symbols.
- Customization: Ability to customize the filtering criteria based on specific requirements.
- TextCleaning: Cleanse input text from unwanted characters to prepare it for further processing or analysis.
- Normalization: Standardize text data by removing irregular characters or symbols
- Python v3.11.6
pip install dekimashita
Filter dictionary values recursively, ignoring specified characters.
Args:
data (dict or list): Data (dictionary or list containing dictionaries) to filter.
chars (list): List of characters to filter.
Returns:
dict or list: Filtered data.
data = {
"university": {
"name": "Example University",
"location": "City XYZ",
"courses": [
{
"course_id": "CS101",
"title": "Introduction \n to \n Computer \n\n Science",
"lecturer": {
"name": "Dr. Alan\n Smith",
"email": "[email protected]",
"office": {
"building": "Engineering Tower",
"room_number": "123"
}
},
"students": {
"name": "John Doe",
"student_id": "123456",
"email": "[email protected]",
"grades": {
"assignments": [
{
"assignment_id": "001",
"score": 95,
"comments": "Great job on the assignment!\nKeep up the good work."
},
{
"assignment_id": "002",
"score": 85,
"comments": "Your\n effort is commendable.\r\r However,\nthere is room"
}
],
"final_exam": {
"score": 88,
"comments": "Solid \nperformance overall.\n\rYour understanding of the subject"
}
}
}
}
]
}
}
import json
data = # data_sample
with open("data.json", "w") as json_file:
json.dump(data, json_file, indent=4)
If you have a very complex dictionary and you write without using the Dekimashita filter you will get results like this
import json
from dekimashita import Dekimashita
data = # data_sample
clear = Dekimashita.vdict(data, ['\n', '\r'])
with open("data.json", "w") as json_file:
json.dump(clear, json_file, indent=4)
Remove extra spaces from text.
Args:
text (str): Input text.
Returns:
str: Text with extra spaces removed.
from dekimashita import Dekimashita
text = 'moon beautiful isn"t it'
clear = Dekimashita.vspace(text)
print('without Dekimashita filter: ' text)
print('with Dekimashita filter: ' clear)
# output
without Dekimashita filter: moon beautiful isn"t it
with Dekimashita filter: moon beautiful isn"t it
Remove non-alphabetic characters (except a-z, A-Z) from text.
Args:
text (str): Input text.
Returns:
str: Filtered text containing only alphabetic characters.
from dekimashita import Dekimashita
text = 'mo&on b)(*&^%$e!au!t@#$i*f!ul is!!$#n"t i)(*&^t'
clear = Dekimashita.valpha(text)
print('without Dekimashita filter: ' text)
print('with Dekimashita filter: ' clear)
# output
without Dekimashita filter: mo&on b)(*&^%$e!au!t@#$i*f!ul is!!$#n"t i)(*&^t
with Dekimashita filter: moon beautiful isnt it
Remove non-numeric characters from text.
Args:
text (str): Input text.
Returns:
str: Filtered text containing only numeric characters.
from dekimashita import Dekimashita
text = ' mo30on be7aut20iful i05sn"t it'
clear = Dekimashita.vnum(text)
print('without Dekimashita filter: ' text)
print('with Dekimashita filter: ' clear)
# output
without Dekimashita filter: mo30on be7aut20iful i05sn"t it
with Dekimashita filter: 3072005
Remove non-alphanumeric characters (except a-z, A-Z, 0-9) from text.
Double spaces are replaced with a single space.
Args:
text (str): Input text.
Returns:
str: Filtered text containing only alphanumeric characters.
from dekimashita import Dekimashita
text = 'moon \t\t bea^%$#@utiful isn"t it 30705'
clear = Dekimashita.vtext(text)
print('without Dekimashita filter: ' text)
print('with Dekimashita filter: ' clear)
# output
without Dekimashita filter: moon bea^%$#@utiful isn"t it 30705
with Dekimashita filter: moon beautiful isnt it 30705
"""
Remove non-alphanumeric characters (except a-z, A-Z, 0-9) from text.
Convert all letters to lowercase. Replace spaces with a specified separator.
Double separators are replaced with a single separator.
Args:
text (str): Input text.
separator (str): Separator to replace spaces (default is '_').
Returns:
str: Filtered and normalized text.
"""
from dekimashita import Dekimashita
text = 'Moon Beautiful Isn"t It'
clear = Dekimashita.vdir(text)
print('without Dekimashita filter: ' text)
print('with Dekimashita filter: ' clear)
# output
without Dekimashita filter: Moon Beautiful Isn"t It
with Dekimashita filter: moon_beautiful_isnt_it
│ LICENSE
│ README.md
│ setup.py
│
└───dekimashita
dekimashita.py
__init__.py
👤 Rio Dwi Saputra
- Twitter: @ryyo_cs
- Github: @ryyos
- Instagram: @ryyo.cs
- LinkedIn: @rio-dwi-saputra-23560b287