Python script to delete indices older than a set number of days or hours

Description

Technically, this is more of an ES script, but it has been designed to work with logstash and the default 1 index per day created.

This requires the pyes module. After configuring the variables in the script (decided it was easier this way for the sake of cron) just run the script and it will purge all indices older than the specified number of days using the ES API.

Activity

Show:
Jordan Sissel
August 20, 2013, 5:52 AM

I've put this script here: https://github.com/logstash/expire-logs

Let's make it a community project!

Jordan Sissel
August 20, 2013, 5:52 AM

Marking resolved since the project lives in a git repo now;

If you want access to the repo, please let me know and I will grant it!
Otherwise, the usual fork-patch-pullrequest flow is fine with me

Magnus Persson
August 21, 2013, 9:01 PM

Aaron, I don't think there is at the moment. See an issue sent to the elasticsearch-knapsack project. It's not really a problem unless the user thinks that knapsack magically "locks" an index from deletion. I dont even think there's a way to mark an index in any way to prevent it from being deleted.

— Knapsack is very simple and works without locking or index snapshots. So it is up to you to organize the safe export and import. If the index changes while Knapsack is exporting, you may lose data in the export. Do not run Knapsack in parallel on the same export.

What I do to not run exports in parallel is use jstack and grep for thread(s) named "Knapsack export [<stuff>]". Same should work as a precheck before running expire-logs. Not sure how to match them up though

I would think that if expire-logs is being used, knapsack or something similar is also being used. Perhaps just add a friendly reminder in the readme?

Aaron Mildenstein
August 21, 2013, 9:09 PM

Can totally add an FYI on that.

I'm not sure how many are keeping the logs forever. I initially wrote this to keep "searchable" logs for 30 days in ES, and purge after that. It was much more space-saving to keep the plain-text, gzipped or bzipped logs for a long duration. In the event that something needed to be searched for in older logs, grep could be used, or even the data re-indexed into ES again. We never had any intention to keep logs in the ES indexes past 30 days.

Aaron Mildenstein
January 22, 2014, 11:27 PM

Assignee

Jordan Sissel

Reporter

Aaron Mildenstein
Configure