You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 16, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+21-1Lines changed: 21 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -362,8 +362,28 @@ or
362
362
```python
363
363
df= read_csv("*.csv")
364
364
```
365
+
#### Post-Processing the Input Data
366
+
Both `read_json()`and`read_csv()` support an optional `post_function` parameter, which allows you to specify a function to post-process the data after each individual fileis read in, before it is merged into the final returned DataFrame. For example, you might want to split or combine columns, or compute a new value from existing data.
365
367
366
-
Consult the Pandas documentation for information on supported options for`read_csv()`and`read_json()`.
368
+
Start by creating a post-processing function according to the following prototype:
369
+
370
+
```python
371
+
def my_post_processor(df, filename):
372
+
# do some stuff
373
+
374
+
return df
375
+
```
376
+
377
+
When called, the `df` parameter will be a DataFrame containing the chunk of data just read, and the `filename` parameter will be the name of the file it came from, which will be different for each chunk. **ITISIMPORTANTTHATYOURETURN`df` no matter whether you modified the input DataFrame ornot.**
378
+
379
+
Once you have defined the post-processor function, you can invoke it during your call to `read_json()`or`read_csv()` like so:
@future_warning("The huntlib.entropy() function has been moved to huntlib.util.entropy(). Please update your code. This compatibility will be removed in a future release.")
0 commit comments