Python script to delete indices older than a set number of days or hours
Description
Attachments
- 07 Dec 2012, 09:41 AM
- 04 Jan 2012, 08:10 PM
- 23 Sep 2011, 11:44 PM
- 23 Sep 2011, 09:11 PM
Gliffy Diagrams
Activity
Aaron Mildenstein January 22, 2014 at 11:27 PM
Aaron Mildenstein August 21, 2013 at 9:09 PM
Can totally add an FYI on that.
I'm not sure how many are keeping the logs forever. I initially wrote this to keep "searchable" logs for 30 days in ES, and purge after that. It was much more space-saving to keep the plain-text, gzipped or bzipped logs for a long duration. In the event that something needed to be searched for in older logs, grep could be used, or even the data re-indexed into ES again. We never had any intention to keep logs in the ES indexes past 30 days.
Magnus Persson August 21, 2013 at 9:01 PM
Aaron, I don't think there is at the moment. See an issue sent to the elasticsearch-knapsack project. It's not really a problem unless the user thinks that knapsack magically "locks" an index from deletion. I dont even think there's a way to mark an index in any way to prevent it from being deleted.
— Knapsack is very simple and works without locking or index snapshots. So it is up to you to organize the safe export and import. If the index changes while Knapsack is exporting, you may lose data in the export. Do not run Knapsack in parallel on the same export.
What I do to not run exports in parallel is use jstack and grep for thread(s) named "Knapsack export [<stuff>]". Same should work as a precheck before running expire-logs. Not sure how to match them up though
I would think that if expire-logs is being used, knapsack or something similar is also being used. Perhaps just add a friendly reminder in the readme?
Jordan Sissel August 20, 2013 at 5:52 AM
Marking resolved since the project lives in a git repo now;
If you want access to the repo, please let me know and I will grant it!
Otherwise, the usual fork-patch-pullrequest flow is fine with me
Jordan Sissel August 20, 2013 at 5:52 AM
I've put this script here: https://github.com/logstash/expire-logs
Let's make it a community project!
Technically, this is more of an ES script, but it has been designed to work with logstash and the default 1 index per day created.
This requires the pyes module. After configuring the variables in the script (decided it was easier this way for the sake of cron) just run the script and it will purge all indices older than the specified number of days using the ES API.