First off sort the data, removing any duplicates. In this case duplicates is determined just by the variables sorted on, so permco, permno, and date. This removes the problem of having more than one event on the same day.
Create a temporary table with lag and difference fields. The lag field is more for debugging as the difference field is all we care about. If the event is the first event for the company, leave the fields blank.
Create the output table and only keep events where the event date is either blank (first event) or if the time between events is greater than the window.
mydata: the library where my crsp and cleaned sdc files are
sdc_permno: database with SDC data and permno's instead of cusips (already linked SDC and CRSP)
sdc_final: the output sdc data file with the event dates (non overlapping) and permnos
&beta_pre_length: The macro variable with the length of my pre window for calculating the betas in the market return model
&study_start_date: The macro bariable with the start date
/* SDC_FINAL: Clean up data and remove all SEOs with another SEO within window_prev before. */
PROC SORT DATA=mydata.sdc_permno OUT=tmp_sdc_permno NODUPKEY;
BY permco permno fdate;
BY permco permno fdate idate shares shares_p shares_s;
lag_date = LAG(fdate);
dif_date = DIF(fdate);
IF FIRST.permco THEN DO;
lag_date = .;
dif_date = .;
CREATE TABLE mydata.sdc_final AS
SELECT monotonic() as count, permno, fdate, shares, shares_p, shares_s
WHERE fdate > &study_start_date and (dif_date EQ . OR dif_date > &beta_pre_len)
DROP TABLE tmp_sdc_lag;
DROP TABLE tmp_sdc_permno;