AcF 351b Career Skills in Accounting and Finance

Python for Data Analysis

Stream Assignment Part 1

给定美国债券市场部分英美企业债券发行的信息,通过下载分析WRDS数据库的金融和账务数据,评估英国脱欧对公司债券价格的影响。

💲1. Download and process bond fields data

The bond yield data are downloaded from TRACE on WRDS using WRDS web.

The downloaded file is saved as ‘e792b11838182376.csv’.

Now process the data in the file…

import pandas as pd
# check pandas version
print(pd.__version__)
1.1.3
# read csv as pandas dataframe
trace_on_wrds = '../data/e792b11838182376.csv'
yields = pd.read_csv(trace_on_wrds, usecols=[0, 3, 4, 5], parse_dates=[1])
# show dataframe
yields
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt
0 037833BC3 2015-05-07 7:57:03 NaN
1 037833BC3 2015-05-07 12:29:00 NaN
2 037833BC3 2015-05-07 12:29:00 NaN
3 037833BC3 2015-05-07 12:34:48 NaN
4 037833BC3 2015-05-07 12:37:44 NaN
... ... ... ... ...
1326280 984121CK7 2017-12-28 12:08:56 NaN
1326281 984121CK7 2017-12-28 12:08:56 NaN
1326282 984121CK7 2017-12-28 12:08:57 NaN
1326283 984121CK7 2017-12-29 8:00:11 NaN
1326284 984121CK7 2017-12-29 8:00:11 NaN

1326285 rows × 4 columns

# extract month and year from the date field
yields['year_month'] = yields['trd_exctn_dt'].dt.to_period('M')
yields
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month
0 037833BC3 2015-05-07 7:57:03 NaN 2015-05
1 037833BC3 2015-05-07 12:29:00 NaN 2015-05
2 037833BC3 2015-05-07 12:29:00 NaN 2015-05
3 037833BC3 2015-05-07 12:34:48 NaN 2015-05
4 037833BC3 2015-05-07 12:37:44 NaN 2015-05
... ... ... ... ... ...
1326280 984121CK7 2017-12-28 12:08:56 NaN 2017-12
1326281 984121CK7 2017-12-28 12:08:56 NaN 2017-12
1326282 984121CK7 2017-12-28 12:08:57 NaN 2017-12
1326283 984121CK7 2017-12-29 8:00:11 NaN 2017-12
1326284 984121CK7 2017-12-29 8:00:11 NaN 2017-12

1326285 rows × 5 columns

# sort dataframe first by bond issue, then by trade date, and then by trade time in ascending order.
yields = yields.sort_values(by=['CUSIP_ID', 'trd_exctn_dt', 'trd_exctn_tm'], ascending=True)
yields[64:80]
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month
1198536 00077TAB0 2015-01-05 14:27:45 6.608047 2015-01
1198537 00077TAB0 2015-01-05 14:27:45 6.525827 2015-01
1198538 00077TAB0 2015-01-06 10:23:34 6.842175 2015-01
1198539 00077TAB0 2015-01-06 10:23:34 6.786439 2015-01
1198540 00077TAB0 2015-02-17 11:24:15 7.002821 2015-02
1198541 00077TAB0 2015-02-17 11:24:15 6.897267 2015-02
111651 00077TAB0 2015-03-31 14:47:54 6.758606 2015-03
111652 00077TAB0 2015-03-31 14:59:18 6.816467 2015-03
111653 00077TAB0 2015-03-31 14:59:18 6.816467 2015-03
111654 00077TAB0 2015-04-24 12:10:55 6.029157 2015-04
111655 00077TAB0 2015-04-24 12:10:55 6.029157 2015-04
111656 00077TAB0 2015-06-04 11:17:07 6.404942 2015-06
111657 00077TAB0 2015-06-10 11:09:56 6.492385 2015-06
111658 00077TAB0 2015-06-10 11:09:56 6.489400 2015-06
111659 00077TAB0 2015-06-10 11:09:56 6.484927 2015-06
111660 00077TAB0 2015-07-07 14:35:19 6.901996 2015-07
# for each bond issue in every trade day, we only keep the last yield that occurs in the day.
yields = yields.drop_duplicates(subset=['CUSIP_ID', 'trd_exctn_dt'], keep='last')
yields
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month
1198533 00077TAA2 2015-01-26 16:01:58 NaN 2015-01
1198534 00077TAA2 2015-01-27 13:37:06 NaN 2015-01
1198535 00077TAA2 2015-03-24 16:37:01 NaN 2015-03
111591 00077TAA2 2015-03-26 13:50:47 NaN 2015-03
111592 00077TAA2 2015-05-01 11:03:37 NaN 2015-05
... ... ... ... ... ...
56862 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11
56866 98934KAB6 2017-11-28 8:08:18 2.865028 2017-11
56877 98934KAB6 2017-11-30 17:23:00 2.971758 2017-11
56879 98934KAB6 2017-12-04 9:27:55 3.009462 2017-12
56881 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12

149595 rows × 5 columns

# sort yields first by cusip, then by year_month, and then by yld_pt
yields = yields.sort_values(by=['CUSIP_ID', 'year_month', 'yld_pt'])
# check results
yields[0:10]
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month
1198533 00077TAA2 2015-01-26 16:01:58 NaN 2015-01
1198534 00077TAA2 2015-01-27 13:37:06 NaN 2015-01
1198535 00077TAA2 2015-03-24 16:37:01 NaN 2015-03
111591 00077TAA2 2015-03-26 13:50:47 NaN 2015-03
111592 00077TAA2 2015-05-01 11:03:37 NaN 2015-05
111593 00077TAA2 2015-07-21 11:43:45 NaN 2015-07
111594 00077TAA2 2015-07-27 9:14:28 NaN 2015-07
111595 00077TAA2 2015-07-30 8:12:42 NaN 2015-07
111597 00077TAA2 2015-08-20 12:06:06 NaN 2015-08
111598 00077TAA2 2015-09-17 15:45:59 NaN 2015-09
# group yields by cysip and year_month, then get the indices of groups
yields_g = yields.groupby(['CUSIP_ID', 'year_month']).indices
# the indices of group items in source dataframe
list(yields_g.values())[0:10]
[array([0, 1]),
 array([2, 3]),
 array([4]),
 array([5, 6, 7]),
 array([8]),
 array([9]),
 array([10, 11]),
 array([12, 13]),
 array([14]),
 array([15])]
# get the median indices in source dataframe of each group as list
import numpy as np
inds_median = [(sum(x) // len(x)) for x in list(yields_g.values())]
inds_median[0:10]
[0, 2, 4, 6, 8, 9, 10, 12, 14, 15]
# locate the yields data of each group in the source dataframe
# 这种方式获取的中位数不是算术意义的中位数
# 对于偶数个序列,算术中位数是求中间两位的平均值,而这种方式只是返回中间两位中任意一位的值。
yields_median = yields.iloc[inds_median]
# 这里得到的结果和FAQ中给的期望结果有些差别,但是也是OK的。
yields_median
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month
1198533 00077TAA2 2015-01-26 16:01:58 NaN 2015-01
1198535 00077TAA2 2015-03-24 16:37:01 NaN 2015-03
111592 00077TAA2 2015-05-01 11:03:37 NaN 2015-05
111594 00077TAA2 2015-07-27 9:14:28 NaN 2015-07
111597 00077TAA2 2015-08-20 12:06:06 NaN 2015-08
... ... ... ... ... ...
56809 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08
56840 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09
56855 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10
56862 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11
56881 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12

11177 rows × 5 columns

# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file,)
cusips
cusip_id offering_amt offering_date maturity bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.7500 N N N N 5.0
1 00077TAB0 150000.0 1993-10-13 2093-10-15 CDEB SUB N USA 7.1250 N N N N 6.0
2 001192AA1 300000.0 2001-02-23 2011-01-14 CDEB SEN N USA 7.1250 N N N N 117277.0
3 00163XAM2 175000.0 2006-08-10 2013-08-15 CMTN SEN N USA 5.9000 N N N N 351976.0
4 00206RBS0 1250000.0 2013-02-07 2016-02-12 CDEB SEN N USA 0.7411 N N N N 589049.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1055 961548AQ7 150000.0 1997-03-12 2027-03-15 CDEB SEN N USA 7.6500 N N N N 51090.0
1056 961548AS3 150000.0 1997-06-19 2027-06-15 CDEB SEN N USA 7.5000 N N N N 56219.0
1057 984121CK7 400000.0 2015-02-26 2020-09-01 CDEB SEN N USA 2.7500 N N N N 626119.0
1058 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.0000 N N N N 24061.0
1059 989701AU1 34680.0 2009-06-01 2014-05-15 CDEB SUB N USA 5.6500 N N N N 498722.0

1060 rows × 14 columns

yields_median_usa_gbr = pd.merge(yields_median, cusips, how='inner', right_on ='cusip_id', left_on='CUSIP_ID')
yields_median_usa_gbr
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month cusip_id offering_amt offering_date maturity bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
1 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
2 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
3 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
4 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11173 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11174 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11175 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11176 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0

11177 rows × 19 columns


# export
part1a_median_file = '../data/export/part1a_median_file.csv'
yields_median_usa_gbr.to_csv(part1a_median_file)

# # get median of each month using groupby and a lambda function
# import warnings
# with warnings.catch_warnings():
#     warnings.simplefilter("ignore", category=RuntimeWarning) # ignore RuntimeWarning
#     yields['median'] = yields.groupby(['CUSIP_ID', 'year_month'])[['yld_pt']].transform(lambda x: x.median())
# # check results of 00077TAB0
# yields[40:60]
# yields = yields.sort_values(by=['CUSIP_ID', 'year_month', 'yld_pt'])
# # check results of 00077TAB0
# yields[40:60]

💲2. CRSP on WRDS and 180-day stock return volat

ref: https://support.eventstudy.com/hc/en-us/articles/212258983-CUSIP-6-8-or-9-characters-

CUSIP, which is controlled by Standard & Poor’s, is officially a 9-character string. The first 6 identify the issuer, the 7th and 8th identify the issue and the 9th is a check digit that is fully determined by the first 8 characters.

CRSP (the default data source for Eventus on WRDS and, when subscribed and installed, for Eventus for Windows and Linux) omits the check digit and uses an 8-character CUSIP field.

Common equity issues always have an 8th character of 0. As CRSP covers only common stocks, all CUSIPs in CRSP end in 0 as the 8th character. CUSIPs from other sources that have a non-zero digit in the 8th position are not common stock CUSIPs.

# link crsp permco/permno from cusip_ids
import pandas as pd
links = '../data/crsp_cusip_link.csv'
links_data = pd.read_csv(links)
links_data
CUSIP PERMNO PERMCO trace_startdt trace_enddt crsp_startdt crsp_enddt link_startdt link_enddt
0 00077TAA2 84766 31989 20020711 20110318 20020102 20080424 20020711 20080424
1 00077TAA2 92355 28711 20120614 20150324 20071018 20200722 20120614 20150324
2 00077TAB0 84766 31989 20020906 20110328 20020102 20080424 20020906 20080424
3 00077TAB0 92355 28711 20120607 20150217 20071018 20200722 20120607 20150217
4 001192AA1 15553 116 20021206 20101216 20020102 20090208 20021206 20090208
... ... ... ... ... ... ... ... ... ...
1124 961548AS3 21186 21913 20020726 20141231 20020130 20150701 20020726 20141231
1125 961548AS3 21186 21913 20151112 20170524 20150702 20200930 20151112 20170524
1126 984121CK7 27983 21945 20150226 20200828 20020102 20200930 20150226 20200828
1127 98934KAB6 79363 29908 20020702 20200923 20020102 20200930 20020702 20200923
1128 989701AU1 84129 5057 20090731 20140414 19821101 20200930 20090731 20140414

1129 rows × 9 columns

import numpy as np
# read csv as pandas dataframe
# 下载数据从2014年6月开始。因为要计算180天rolling标准差
# crspa_dsf = '../data/880fcbe48e8d30ce.csv'
# 下载数据从2014年4月开始。因为要计算180天rolling标准差
crspa_dsf = '../data/e862f245484c3340.csv'
# read (PERMNO, date, PERMCO, CUSIP, RET)
dsf = pd.read_csv(crspa_dsf, usecols = [0, 1, 15, 48])
dsf
/Users/jade_mayer/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3146: DtypeWarning: Columns (48) have mixed types.Specify dtype option on import or set low_memory=False.
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
PERMNO date PERMCO RET
0 10145 2014/04/15 22168 0.008689
1 10145 2014/04/16 22168 0.017337
2 10145 2014/04/17 22168 -0.002144
3 10145 2014/04/21 22168 0.000859
4 10145 2014/04/22 22168 0.001717
... ... ... ... ...
140596 93013 2017/12/22 53204 0.001886
140597 93013 2017/12/26 53204 -0.001345
140598 93013 2017/12/27 53204 0.006642
140599 93013 2017/12/28 53204 -0.007936
140600 93013 2017/12/29 53204 0.002696

140601 rows × 4 columns

Missing Value Codes in SAS

When looking up a holding period return (RET), or delisting return (DLRET), SAS may return a missing value code. These are indicated with a decimal point followed by a letter (e.g. “.E”, “.D”). The descriptions for the missing value codes in SAS can be found on the “Variable Description” page for CRSP Stock.

# missing value
# B means Off-exchange
subset = dsf.loc[dsf['RET'] == 'B'].count()
subset
PERMNO    0
date      0
PERMCO    0
RET       0
dtype: int64
# missing value
# C means No valid previous price
subset = dsf.loc[dsf['RET'] == 'C'].count()
subset
PERMNO    3
date      3
PERMCO    3
RET       3
dtype: int64
# remove missing data rows
dsf = dsf[dsf.RET != 'C']
# df = df[df.RET != 'B']
dsf
PERMNO date PERMCO RET
0 10145 2014/04/15 22168 0.008689
1 10145 2014/04/16 22168 0.017337
2 10145 2014/04/17 22168 -0.002144
3 10145 2014/04/21 22168 0.000859
4 10145 2014/04/22 22168 0.001717
... ... ... ... ...
140596 93013 2017/12/22 53204 0.001886
140597 93013 2017/12/26 53204 -0.001345
140598 93013 2017/12/27 53204 0.006642
140599 93013 2017/12/28 53204 -0.007936
140600 93013 2017/12/29 53204 0.002696

140598 rows × 4 columns

# calculate rolling standard deviation of 180-day

# dsf['rolling'] = 
rollings = dsf.groupby('PERMNO')['RET'].rolling(180).std()
# rollings_df = pd.DataFrame(data = [rollings.values] * len(rollings), columns = rollings.index)
rollings[(10145, 0)]
nan
dsf.index
Int64Index([     0,      1,      2,      3,      4,      5,      6,      7,
                 8,      9,
            ...
            140591, 140592, 140593, 140594, 140595, 140596, 140597, 140598,
            140599, 140600],
           dtype='int64', length=140598)
# join rolling data with source dataframe on (dsf.index, 'PERMNO')
dsf = dsf.merge(rollings.rename('rollings_'), on=[dsf.index, 'PERMNO'] ,left_index=True, right_index=True)
dsf
PERMNO date PERMCO RET rollings_
0 10145 2014/04/15 22168 0.008689 NaN
1 10145 2014/04/16 22168 0.017337 NaN
2 10145 2014/04/17 22168 -0.002144 NaN
3 10145 2014/04/21 22168 0.000859 NaN
4 10145 2014/04/22 22168 0.001717 NaN
... ... ... ... ... ...
140596 93013 2017/12/22 53204 0.001886 0.010455
140597 93013 2017/12/26 53204 -0.001345 0.010445
140598 93013 2017/12/27 53204 0.006642 0.010446
140599 93013 2017/12/28 53204 -0.007936 0.010463
140600 93013 2017/12/29 53204 0.002696 0.010431

140598 rows × 5 columns

dsf_cusip = pd.merge(dsf, links_data, on=['PERMNO'], how='inner')
dsf_cusip
PERMNO date PERMCO_x RET rollings_ CUSIP PERMCO_y trace_startdt trace_enddt crsp_startdt crsp_enddt link_startdt link_enddt
0 10145 2014/04/15 22168 0.008689 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
1 10145 2014/04/16 22168 0.017337 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
2 10145 2014/04/17 22168 -0.002144 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
3 10145 2014/04/21 22168 0.000859 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
4 10145 2014/04/22 22168 0.001717 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1001488 93013 2017/12/29 53204 0.002696 0.010431 03523TBK3 53204 20110707 20140623 20090916 20200930 20110707 20140623
1001489 93013 2017/12/29 53204 0.002696 0.010431 035240AC4 53204 20161220 20180725 20090916 20200930 20161220 20180725
1001490 93013 2017/12/29 53204 0.002696 0.010431 035242AF3 53204 20140122 20190131 20090916 20200930 20140122 20190131
1001491 93013 2017/12/29 53204 0.002696 0.010431 035242AK2 53204 20160113 20200930 20090916 20200930 20160113 20200930
1001492 93013 2017/12/29 53204 0.002696 0.010431 03524BAD8 53204 20140122 20170117 20090916 20200930 20140122 20170117

1001493 rows × 13 columns

dsf_cusip.loc[dsf_cusip['PERMNO'] == 10145]
PERMNO date PERMCO_x RET rollings_ CUSIP PERMCO_y trace_startdt trace_enddt crsp_startdt crsp_enddt link_startdt link_enddt
0 10145 2014/04/15 22168 0.008689 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
1 10145 2014/04/16 22168 0.017337 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
2 10145 2014/04/17 22168 -0.002144 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
3 10145 2014/04/21 22168 0.000859 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
4 10145 2014/04/22 22168 0.001717 NaN 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
... ... ... ... ... ... ... ... ... ... ... ... ... ...
931 10145 2017/12/22 22168 -0.001496 0.006667 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
932 10145 2017/12/26 22168 0.001759 0.006667 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
933 10145 2017/12/27 22168 0.001236 0.006667 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
934 10145 2017/12/28 22168 0.001169 0.006601 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024
935 10145 2017/12/29 22168 -0.004996 0.006600 438516BK1 22168 20161024 20191024 20020102 20200930 20161024 20191024

936 rows × 13 columns

# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file)
cusips
cusip_id offering_amt offering_date maturity bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.7500 N N N N 5.0
1 00077TAB0 150000.0 1993-10-13 2093-10-15 CDEB SUB N USA 7.1250 N N N N 6.0
2 001192AA1 300000.0 2001-02-23 2011-01-14 CDEB SEN N USA 7.1250 N N N N 117277.0
3 00163XAM2 175000.0 2006-08-10 2013-08-15 CMTN SEN N USA 5.9000 N N N N 351976.0
4 00206RBS0 1250000.0 2013-02-07 2016-02-12 CDEB SEN N USA 0.7411 N N N N 589049.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1055 961548AQ7 150000.0 1997-03-12 2027-03-15 CDEB SEN N USA 7.6500 N N N N 51090.0
1056 961548AS3 150000.0 1997-06-19 2027-06-15 CDEB SEN N USA 7.5000 N N N N 56219.0
1057 984121CK7 400000.0 2015-02-26 2020-09-01 CDEB SEN N USA 2.7500 N N N N 626119.0
1058 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.0000 N N N N 24061.0
1059 989701AU1 34680.0 2009-06-01 2014-05-15 CDEB SUB N USA 5.6500 N N N N 498722.0

1060 rows × 14 columns

dsf_cusip_country = pd.merge(dsf_cusip, cusips, left_on=['CUSIP'], right_on=['cusip_id'], how='inner')
dsf_cusip_country.loc[dsf_cusip['PERMNO'] == 10145]
PERMNO date PERMCO_x RET rollings_ CUSIP PERMCO_y trace_startdt trace_enddt crsp_startdt ... bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 10145 2014/04/15 22168 0.008689 NaN 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
1 10145 2014/04/16 22168 0.017337 NaN 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
2 10145 2014/04/17 22168 -0.002144 NaN 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
3 10145 2014/04/21 22168 0.000859 NaN 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
4 10145 2014/04/22 22168 0.001717 NaN 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
931 10145 2017/12/22 22168 -0.001496 0.006667 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
932 10145 2017/12/26 22168 0.001759 0.006667 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
933 10145 2017/12/27 22168 0.001236 0.006667 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
934 10145 2017/12/28 22168 0.001169 0.006601 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
935 10145 2017/12/29 22168 -0.004996 0.006600 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0

936 rows × 27 columns

dsf_cusip_country_nnan = dsf_cusip_country.dropna()

dsf_cusip_country_nnan
PERMNO date PERMCO_x RET rollings_ CUSIP PERMCO_y trace_startdt trace_enddt crsp_startdt ... bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
179 10145 2014/12/30 22168 -0.009344 0.010120 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
180 10145 2014/12/31 22168 -0.007943 0.010122 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
181 10145 2015/01/02 22168 0.003103 0.010046 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
182 10145 2015/01/05 22168 -0.019056 0.010150 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
183 10145 2015/01/06 22168 -0.002339 0.010152 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1001488 93013 2017/12/22 53204 0.001886 0.010455 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001489 93013 2017/12/26 53204 -0.001345 0.010445 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001490 93013 2017/12/27 53204 0.006642 0.010446 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001491 93013 2017/12/28 53204 -0.007936 0.010463 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001492 93013 2017/12/29 53204 0.002696 0.010431 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0

786357 rows × 27 columns

dsf_cusip_country_ = dsf_cusip_country_nnan.drop_duplicates(subset=['cusip_id', 'date'], keep='last')


dsf_cusip_country_
PERMNO date PERMCO_x RET rollings_ CUSIP PERMCO_y trace_startdt trace_enddt crsp_startdt ... bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
179 10145 2014/12/30 22168 -0.009344 0.010120 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
180 10145 2014/12/31 22168 -0.007943 0.010122 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
181 10145 2015/01/02 22168 0.003103 0.010046 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
182 10145 2015/01/05 22168 -0.019056 0.010150 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
183 10145 2015/01/06 22168 -0.002339 0.010152 438516BK1 22168 20161024 20191024 20020102 ... CDEB SEN N USA 2.54658 N N N N 663731.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1001488 93013 2017/12/22 53204 0.001886 0.010455 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001489 93013 2017/12/26 53204 -0.001345 0.010445 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001490 93013 2017/12/27 53204 0.006642 0.010446 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001491 93013 2017/12/28 53204 -0.007936 0.010463 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0
1001492 93013 2017/12/29 53204 0.002696 0.010431 03524BAD8 53204 20140122 20170117 20090916 ... CDEB SEN N USA 1.07567 N N N N 605853.0

716846 rows × 27 columns

dsf_cusip_country_.to_csv('../data/export/part1b_180days_rolling_window.csv')

💲3. PRCCQ from Compustat

ref: https://bbs.pinggu.org/thread-3588010-1-1.html

关于market leverage和book leverage的研究可以追溯到Fama和French在1992年的论文The Cross-Section of Expected Stock Returns(没错就是讲三因子模型的那篇)。

论文里关于两种leverage的定义原话是We use two leverage variables, the ratio of book assets to market equity, A/ME, and the ratio of book assets to book equity, A/BE. We interpret A/ME as a measure of market leverage, while A/BE is a measure of book leverage.

A指的是Total Assets,ME指的是Market Value of Equity,BE指的是Book Value of Equity。Market leverage和book leverage是对公司财务杠杆的两种衡量方式,我们平时一般说的financial leverage指的是book leverage。

顺带一提,论文中的结论是股票收益和market leverage成正相关,和book leverage不相关甚至轻微负相关

ref: https://www.indeed.com/career-advice/career-development/how-to-calculate-total-debt

Total debt is calculated by adding up a company’s liabilities, or debts, which are categorized as short and long-term debt. Financial lenders or business leaders may look at a company’s balance sheet to factor in the debt ratio to make informed decisions about future loan options. They calculate the debt ratio by taking the total debt and dividing it by the total assets.

Outstanding Debt

Debt that has not yet been repaid in full. For example, if one borrows $10,000 and has paid back $2,000, the outstanding debt is $8,000. In general, interest is calculated over the outstanding debt rather than the original amount borrowed.

ref: https://www.lawinsider.com/dictionary/book-leverage-ratio

Book Leverage Ratio means the ratio of Total Consolidated Long Term Debt to Total Assets, as shown in the applicable Financial Statements for Guarantor A for any accounting period and determined in accordance with GAAP.

ref: https://www.quora.com/What-are-the-current-liabilities-in-a-balance-sheet

Liability: Liability is something which the business owes. For example, a loan taken from a bank.

Classification of Liabilities-

Current liabilities: Current liabilities are the liabilities which the business has to pay within a year. These are short-term liabilities. For example, trade creditors. Trade Creditors are the suppliers from whom we purchase the goods on credit. Usually, the payment to trade creditors is made within one year. Other examples of current liabilities are outstanding(unpaid) salaries, tax payable, etc.

Non-Current Liabilities: Non-Current Liabilities are long term liabilities which are not repayable within one year. For example, Long term loan taken for say five years.

Contingent Liability: Contingent Liability is that kind of a liability which is non-existent as on date, but it may become an actual liability in the future. For example, if a customer has filed a suit against the company for some compensation. This can become an actual liability in the future if the firm loses the case. However, as on date, it is not a liability as the outcome is not known today.

关于CRSP / Compustat链接表的详细信息:

http://kaichen.work/?p=138

gvkey_crsp_link = '../data/gvkey_crsp_link.csv'
gvkey = pd.read_csv(gvkey_crsp_link)
# gvkey = gvkey.dropna()
# gvkey_ = gvkey['gvkey']
# gvkey_.to_csv('../gvkeys_unique.csv')
gvkey
GVKEY LINKPRIM LIID LINKTYPE LPERMNO LPERMCO LINKDT LINKENDDT conm tic ... ipodate dldte STKO FYRC GSECTOR GGROUP GIND GSUBIND SPCINDCD SPCSECCD
0 1300 P 01 LC 10145 22168 19620131 E HONEYWELL INTERNATIONAL INC HON ... NaN NaN 0 12 20.0 2010.0 201050.0 20105010.0 355 925
1 1318 P 01 LU 41443 20057 19650118 20071130 ALLTEL CORP AT.2 ... NaN 20071119.0 0 12 50.0 5010.0 501020.0 50102010.0 715 974
2 1440 P 01 LC 24109 20077 19620131 E AMERICAN ELECTRIC POWER CO AEP ... NaN NaN 0 12 55.0 5510.0 551010.0 55101010.0 705 700
3 1440 C 00X LC 24109 20077 19510101 19620130 AMERICAN ELECTRIC POWER CO AEP ... NaN NaN 0 12 55.0 5510.0 551010.0 55101010.0 705 700
4 1447 C 01 LU 59176 90 19721214 E AMERICAN EXPRESS CO AXP ... NaN NaN 0 12 40.0 4020.0 402020.0 40202010.0 850 800
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
231 209382 P 91 LC 89330 37665 20020322 E VALE SA VALE ... NaN NaN 0 12 15.0 1510.0 151040.0 15104050.0 360 970
232 212340 P 90 LU 85888 16024 19980327 20190131 SHIRE PLC SHPG ... 19960215.0 20190109.0 0 12 35.0 3520.0 352010.0 35201010.0 285 905
233 241216 P 90 LC 88822 41009 20001113 E SYNGENTA AG SYT ... 20001113.0 20180110.0 0 12 15.0 1510.0 151010.0 15101030.0 167 970
234 241637 P 90 LC 93013 53204 20090916 E ANHEUSER-BUSCH INBEV BUD ... NaN NaN 0 12 30.0 3020.0 302010.0 30201010.0 140 978
235 252940 P 90 LC 75811 22107 20010402 E MITSUBISHI UFJ FINANCIAL GRP MUFG ... 20010401.0 NaN 0 3 40.0 4010.0 401010.0 40101010.0 810 800

236 rows × 47 columns

# fundq = '../data/7cf711fc5ea7973d.csv'
fundq = '../data/309e6ea3f87e4430.csv'

# open fundq as dataframe
df = pd.read_csv(fundq)
df
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr datafqtr atq cshoq dlttq lctq costat prccq
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 2015Q1 45357.000 781.707 5661.000 15432.0 A 104.31
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 2015Q2 46412.000 781.762 5562.000 15574.0 A 101.97
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 2015Q3 46625.000 770.691 5599.000 16367.0 A 94.69
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 2015Q4 49316.000 770.400 5554.000 18371.0 A 103.57
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 2016Q1 50365.000 762.115 9700.000 15659.0 A 112.05
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1851 252940 2016/12/31 2016 3 INDL C D STD USD 2016Q4 2016Q3 2586542.894 13429.631 219398.261 NaN A 6.16
1852 252940 2017/03/31 2016 4 INDL C D STD USD 2017Q1 2016Q4 2722353.780 13429.944 241134.298 NaN A 6.34
1853 252940 2017/06/30 2017 1 INDL C D STD USD 2017Q2 2017Q1 NaN NaN NaN NaN A 6.75
1854 252940 2017/09/30 2017 2 INDL C D STD USD 2017Q3 2017Q2 2666912.107 13289.784 267516.335 NaN A 6.44
1855 252940 2017/12/31 2017 3 INDL C D STD USD 2017Q4 2017Q3 NaN NaN NaN NaN A 7.27

1856 rows × 17 columns

# compute total outstanding debt
df['total_debt'] = df['dlttq'] + df['lctq']
# compute book leverage
df['book_leverage'] = df['total_debt'] / df['atq']
df
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr datafqtr atq cshoq dlttq lctq costat prccq total_debt book_leverage
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 2015Q1 45357.000 781.707 5661.000 15432.0 A 104.31 21093.0 0.465044
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 2015Q2 46412.000 781.762 5562.000 15574.0 A 101.97 21136.0 0.455399
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 2015Q3 46625.000 770.691 5599.000 16367.0 A 94.69 21966.0 0.471121
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 2015Q4 49316.000 770.400 5554.000 18371.0 A 103.57 23925.0 0.485137
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 2016Q1 50365.000 762.115 9700.000 15659.0 A 112.05 25359.0 0.503504
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1851 252940 2016/12/31 2016 3 INDL C D STD USD 2016Q4 2016Q3 2586542.894 13429.631 219398.261 NaN A 6.16 NaN NaN
1852 252940 2017/03/31 2016 4 INDL C D STD USD 2017Q1 2016Q4 2722353.780 13429.944 241134.298 NaN A 6.34 NaN NaN
1853 252940 2017/06/30 2017 1 INDL C D STD USD 2017Q2 2017Q1 NaN NaN NaN NaN A 6.75 NaN NaN
1854 252940 2017/09/30 2017 2 INDL C D STD USD 2017Q3 2017Q2 2666912.107 13289.784 267516.335 NaN A 6.44 NaN NaN
1855 252940 2017/12/31 2017 3 INDL C D STD USD 2017Q4 2017Q3 NaN NaN NaN NaN A 7.27 NaN NaN

1856 rows × 19 columns

# compute market capitalization
df['market_capitalization'] = (df['prccq'] * df['cshoq'])
# compute market leverage
df['market_leverage'] = df['total_debt'] / df['market_capitalization']
df
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... atq cshoq dlttq lctq costat prccq total_debt book_leverage market_capitalization market_leverage
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... 45357.000 781.707 5661.000 15432.0 A 104.31 21093.0 0.465044 81539.85717 0.258683
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... 46412.000 781.762 5562.000 15574.0 A 101.97 21136.0 0.455399 79716.27114 0.265140
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... 46625.000 770.691 5599.000 16367.0 A 94.69 21966.0 0.471121 72976.73079 0.301000
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... 49316.000 770.400 5554.000 18371.0 A 103.57 23925.0 0.485137 79790.32800 0.299848
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... 50365.000 762.115 9700.000 15659.0 A 112.05 25359.0 0.503504 85394.98575 0.296961
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1851 252940 2016/12/31 2016 3 INDL C D STD USD 2016Q4 ... 2586542.894 13429.631 219398.261 NaN A 6.16 NaN NaN 82726.52696 NaN
1852 252940 2017/03/31 2016 4 INDL C D STD USD 2017Q1 ... 2722353.780 13429.944 241134.298 NaN A 6.34 NaN NaN 85145.84496 NaN
1853 252940 2017/06/30 2017 1 INDL C D STD USD 2017Q2 ... NaN NaN NaN NaN A 6.75 NaN NaN NaN NaN
1854 252940 2017/09/30 2017 2 INDL C D STD USD 2017Q3 ... 2666912.107 13289.784 267516.335 NaN A 6.44 NaN NaN 85586.20896 NaN
1855 252940 2017/12/31 2017 3 INDL C D STD USD 2017Q4 ... NaN NaN NaN NaN A 7.27 NaN NaN NaN NaN

1856 rows × 21 columns

leverage = pd.merge(df, gvkey, left_on='gvkey', right_on='GVKEY')
leverage
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... ipodate dldte STKO FYRC GSECTOR GGROUP GIND GSUBIND SPCINDCD SPCSECCD
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... NaN NaN 0 12 20.0 2010.0 201050.0 20105010.0 355 925
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... NaN NaN 0 12 20.0 2010.0 201050.0 20105010.0 355 925
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... NaN NaN 0 12 20.0 2010.0 201050.0 20105010.0 355 925
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... NaN NaN 0 12 20.0 2010.0 201050.0 20105010.0 355 925
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... NaN NaN 0 12 20.0 2010.0 201050.0 20105010.0 355 925
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2383 252940 2016/12/31 2016 3 INDL C D STD USD 2016Q4 ... 20010401.0 NaN 0 3 40.0 4010.0 401010.0 40101010.0 810 800
2384 252940 2017/03/31 2016 4 INDL C D STD USD 2017Q1 ... 20010401.0 NaN 0 3 40.0 4010.0 401010.0 40101010.0 810 800
2385 252940 2017/06/30 2017 1 INDL C D STD USD 2017Q2 ... 20010401.0 NaN 0 3 40.0 4010.0 401010.0 40101010.0 810 800
2386 252940 2017/09/30 2017 2 INDL C D STD USD 2017Q3 ... 20010401.0 NaN 0 3 40.0 4010.0 401010.0 40101010.0 810 800
2387 252940 2017/12/31 2017 3 INDL C D STD USD 2017Q4 ... 20010401.0 NaN 0 3 40.0 4010.0 401010.0 40101010.0 810 800

2388 rows × 68 columns

links_data
CUSIP PERMNO PERMCO trace_startdt trace_enddt crsp_startdt crsp_enddt link_startdt link_enddt
0 00077TAA2 84766 31989 20020711 20110318 20020102 20080424 20020711 20080424
1 00077TAA2 92355 28711 20120614 20150324 20071018 20200722 20120614 20150324
2 00077TAB0 84766 31989 20020906 20110328 20020102 20080424 20020906 20080424
3 00077TAB0 92355 28711 20120607 20150217 20071018 20200722 20120607 20150217
4 001192AA1 15553 116 20021206 20101216 20020102 20090208 20021206 20090208
... ... ... ... ... ... ... ... ... ...
1124 961548AS3 21186 21913 20020726 20141231 20020130 20150701 20020726 20141231
1125 961548AS3 21186 21913 20151112 20170524 20150702 20200930 20151112 20170524
1126 984121CK7 27983 21945 20150226 20200828 20020102 20200930 20150226 20200828
1127 98934KAB6 79363 29908 20020702 20200923 20020102 20200930 20020702 20200923
1128 989701AU1 84129 5057 20090731 20140414 19821101 20200930 20090731 20140414

1129 rows × 9 columns

leverge_ = pd.merge(leverage, links_data, left_on='LPERMNO', right_on='PERMNO')
leverge_
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... SPCSECCD CUSIP PERMNO PERMCO trace_startdt trace_enddt crsp_startdt crsp_enddt link_startdt link_enddt
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
15093 252940 2016/12/31 2016 3 INDL C D STD USD 2016Q4 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15094 252940 2017/03/31 2016 4 INDL C D STD USD 2017Q1 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15095 252940 2017/06/30 2017 1 INDL C D STD USD 2017Q2 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15096 252940 2017/09/30 2017 2 INDL C D STD USD 2017Q3 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15097 252940 2017/12/31 2017 3 INDL C D STD USD 2017Q4 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203

15098 rows × 77 columns

leverge_unique = leverge_.drop_duplicates(subset=['CUSIP', 'datadate'], keep='last')

leverge_unique
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... SPCSECCD CUSIP PERMNO PERMCO trace_startdt trace_enddt crsp_startdt crsp_enddt link_startdt link_enddt
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... 925 438516BK1 10145 22168 20161024 20191024 20020102 20200930 20161024 20191024
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
15093 252940 2016/12/31 2016 3 INDL C D STD USD 2016Q4 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15094 252940 2017/03/31 2016 4 INDL C D STD USD 2017Q1 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15095 252940 2017/06/30 2017 1 INDL C D STD USD 2017Q2 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15096 252940 2017/09/30 2017 2 INDL C D STD USD 2017Q3 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203
15097 252940 2017/12/31 2017 3 INDL C D STD USD 2017Q4 ... 800 904851AA0 75811 22107 20120816 20131203 20051003 20180401 20120816 20131203

11817 rows × 77 columns

# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file)
cusips
cusip_id offering_amt offering_date maturity bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.7500 N N N N 5.0
1 00077TAB0 150000.0 1993-10-13 2093-10-15 CDEB SUB N USA 7.1250 N N N N 6.0
2 001192AA1 300000.0 2001-02-23 2011-01-14 CDEB SEN N USA 7.1250 N N N N 117277.0
3 00163XAM2 175000.0 2006-08-10 2013-08-15 CMTN SEN N USA 5.9000 N N N N 351976.0
4 00206RBS0 1250000.0 2013-02-07 2016-02-12 CDEB SEN N USA 0.7411 N N N N 589049.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1055 961548AQ7 150000.0 1997-03-12 2027-03-15 CDEB SEN N USA 7.6500 N N N N 51090.0
1056 961548AS3 150000.0 1997-06-19 2027-06-15 CDEB SEN N USA 7.5000 N N N N 56219.0
1057 984121CK7 400000.0 2015-02-26 2020-09-01 CDEB SEN N USA 2.7500 N N N N 626119.0
1058 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.0000 N N N N 24061.0
1059 989701AU1 34680.0 2009-06-01 2014-05-15 CDEB SUB N USA 5.6500 N N N N 498722.0

1060 rows × 14 columns

leverge_unique_country = pd.merge(leverge_unique, cusips, left_on='CUSIP', right_on='cusip_id')
leverge_unique_country
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... CDEB SEN N USA 2.54658 N N N N 663731.0
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... CDEB SEN N USA 2.54658 N N N N 663731.0
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... CDEB SEN N USA 2.54658 N N N N 663731.0
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11812 252940 2016/12/31 2016 3 INDL C D STD USD 2016Q4 ... CDEB SUB N USA 5.25000 N N N N 176262.0
11813 252940 2017/03/31 2016 4 INDL C D STD USD 2017Q1 ... CDEB SUB N USA 5.25000 N N N N 176262.0
11814 252940 2017/06/30 2017 1 INDL C D STD USD 2017Q2 ... CDEB SUB N USA 5.25000 N N N N 176262.0
11815 252940 2017/09/30 2017 2 INDL C D STD USD 2017Q3 ... CDEB SUB N USA 5.25000 N N N N 176262.0
11816 252940 2017/12/31 2017 3 INDL C D STD USD 2017Q4 ... CDEB SUB N USA 5.25000 N N N N 176262.0

11817 rows × 91 columns

# leverge_unique_country = leverge_unique_country.dropna()
leverge_unique_country = leverge_unique_country[leverge_unique_country['market_leverage'].notna()]
leverge_unique_country
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... CDEB SEN N USA 2.54658 N N N N 663731.0
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... CDEB SEN N USA 2.54658 N N N N 663731.0
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... CDEB SEN N USA 2.54658 N N N N 663731.0
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11796 241637 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11798 241637 2016/06/30 2016 2 INDL C D STD USD 2016Q2 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11800 241637 2016/12/31 2016 4 INDL C D STD USD 2016Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11802 241637 2017/06/30 2017 2 INDL C D STD USD 2017Q2 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11804 241637 2017/12/31 2017 4 INDL C D STD USD 2017Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0

3205 rows × 91 columns

leverge_unique_country = leverge_unique_country[leverge_unique_country['book_leverage'].notna()]
leverge_unique_country
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... CDEB SEN N USA 2.54658 N N N N 663731.0
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... CDEB SEN N USA 2.54658 N N N N 663731.0
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... CDEB SEN N USA 2.54658 N N N N 663731.0
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11796 241637 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11798 241637 2016/06/30 2016 2 INDL C D STD USD 2016Q2 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11800 241637 2016/12/31 2016 4 INDL C D STD USD 2016Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11802 241637 2017/06/30 2017 2 INDL C D STD USD 2017Q2 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11804 241637 2017/12/31 2017 4 INDL C D STD USD 2017Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0

3205 rows × 91 columns

leverge_unique_country
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr ... bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 1300 2015/03/31 2015 1 INDL C D STD USD 2015Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
1 1300 2015/06/30 2015 2 INDL C D STD USD 2015Q2 ... CDEB SEN N USA 2.54658 N N N N 663731.0
2 1300 2015/09/30 2015 3 INDL C D STD USD 2015Q3 ... CDEB SEN N USA 2.54658 N N N N 663731.0
3 1300 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... CDEB SEN N USA 2.54658 N N N N 663731.0
4 1300 2016/03/31 2016 1 INDL C D STD USD 2016Q1 ... CDEB SEN N USA 2.54658 N N N N 663731.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11796 241637 2015/12/31 2015 4 INDL C D STD USD 2015Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11798 241637 2016/06/30 2016 2 INDL C D STD USD 2016Q2 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11800 241637 2016/12/31 2016 4 INDL C D STD USD 2016Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11802 241637 2017/06/30 2017 2 INDL C D STD USD 2017Q2 ... CDEB SEN N USA 1.07567 N N N N 605853.0
11804 241637 2017/12/31 2017 4 INDL C D STD USD 2017Q4 ... CDEB SEN N USA 1.07567 N N N N 605853.0

3205 rows × 91 columns

leverge_unique_country.to_csv('../data/export/part1c_leverge.csv')

💲4. Locate the latest credit rating

# monthly yields data got in Part1(a)
yields_median_usa_gbr
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month cusip_id offering_amt offering_date maturity bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
1 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
2 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
3 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
4 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11173 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11174 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11175 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11176 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0

11177 rows × 19 columns

# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file, usecols=[0, 13])
cusips
cusip_id issue_id
0 00077TAA2 5.0
1 00077TAB0 6.0
2 001192AA1 117277.0
3 00163XAM2 351976.0
4 00206RBS0 589049.0
... ... ...
1055 961548AQ7 51090.0
1056 961548AS3 56219.0
1057 984121CK7 626119.0
1058 98934KAB6 24061.0
1059 989701AU1 498722.0

1060 rows × 2 columns

cusips = cusips.sort_values(by=['issue_id'])
# remove cusip 40056E6B9 beacuse it lose issus_id data
cusips.dropna(inplace=True)
cusips.issue_id.astype(np.int64)
cusips
cusip_id issue_id
0 00077TAA2 5.0
1 00077TAB0 6.0
40 030955AB4 1057.0
128 126168AB9 3024.0
391 386088AB4 11624.0
... ... ...
811 61769HDQ5 830907.0
966 80283LAX1 832065.0
614 40056FJX4 832139.0
615 40056FKS3 834537.0
846 674599CT0 843393.0

1059 rows × 2 columns

# read csv as pandas dataframe
# acf351b_ratings.csv
rating_file = '../data/acf351b_ratings.csv'
ratings = pd.read_csv(rating_file)
ratings
issue_id rating_type rating_date rating
0 5 DPR 5/20/1993 NR
1 5 MR 5/24/2017 NR
2 5 FR 12/14/2018 A-
3 5 SPR 5/16/2019 BB+
4 6 DPR 10/13/1993 NR
... ... ... ... ...
3231 834537 FR 6/25/2019 NR
3232 834537 MR 8/1/2019 A3
3233 843393 MR 8/1/2019 Baa3
3234 843393 FR 8/6/2019 NR
3235 843393 SPR 8/8/2019 BBB

3236 rows × 4 columns

# change dtype of rating_date from string to date
ratings['rating_date'] = pd.to_datetime(ratings['rating_date'])
ratings
issue_id rating_type rating_date rating
0 5 DPR 1993-05-20 NR
1 5 MR 2017-05-24 NR
2 5 FR 2018-12-14 A-
3 5 SPR 2019-05-16 BB+
4 6 DPR 1993-10-13 NR
... ... ... ... ...
3231 834537 FR 2019-06-25 NR
3232 834537 MR 2019-08-01 A3
3233 843393 MR 2019-08-01 Baa3
3234 843393 FR 2019-08-06 NR
3235 843393 SPR 2019-08-08 BBB

3236 rows × 4 columns

# merge two dataframe by outer join
ratings_of_cusips = pd.merge(cusips, ratings, how='outer', on ='issue_id')
ratings_of_cusips
cusip_id issue_id rating_type rating_date rating
0 00077TAA2 5.0 DPR 1993-05-20 NR
1 00077TAA2 5.0 MR 2017-05-24 NR
2 00077TAA2 5.0 FR 2018-12-14 A-
3 00077TAA2 5.0 SPR 2019-05-16 BB+
4 00077TAB0 6.0 DPR 1993-10-13 NR
... ... ... ... ... ...
3231 674599CT0 843393.0 FR 2019-08-06 NR
3232 674599CT0 843393.0 SPR 2019-08-08 BBB
3233 NaN 785195.0 SPR 2018-10-26 NR
3234 NaN 785195.0 FR 2018-10-26 NR
3235 NaN 785195.0 MR 2019-05-16 A3

3236 rows × 5 columns

# test
# get the cusio_id of each row
for row in yields_median_usa_gbr.iterrows():
    print(row[1]['CUSIP_ID'])
    break
00077TAA2
# test
# locate the ratings of yield row
for row in yields_median_usa_gbr.iterrows():
    ratings_of_row =  ratings_of_cusips.loc[ratings_of_cusips['cusip_id'] == row[1]['CUSIP_ID']]
    date_of_row = row[1]['trd_exctn_dt']
    break
print(ratings_of_row)
    cusip_id  issue_id rating_type rating_date rating
0  00077TAA2       5.0         DPR  1993-05-20     NR
1  00077TAA2       5.0          MR  2017-05-24     NR
2  00077TAA2       5.0          FR  2018-12-14     A-
3  00077TAA2       5.0         SPR  2019-05-16    BB+
# test
# get the latest credit ratings
print(ratings_of_row['rating_date'].sort_values())
print(date_of_row)
ind = ratings_of_row['rating_date'].sort_values().searchsorted(date_of_row)
ind-1
0   1993-05-20
1   2017-05-24
2   2018-12-14
3   2019-05-16
Name: rating_date, dtype: datetime64[ns]
2015-01-26 00:00:00





0
# the monthly yield data generate in part1(a)
yields_median_usa_gbr
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month cusip_id offering_amt offering_date maturity bond_type security_level yankee country_domicile coupon asset_backed putable convertible callable issue_id
0 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
1 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
2 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
3 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
4 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 00077TAA2 250000.0 1993-05-20 2023-05-15 CDEB SUB N USA 7.75 N N N N 5.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11173 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11174 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11175 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0
11176 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 98934KAB6 300000.0 1993-11-09 2023-11-15 CDEB SEN N USA 7.00 N N N N 24061.0

11177 rows × 19 columns

from datetime import datetime
# get the latest rating level for each row in monthly yield data
ratings = []
for row in yields_median_usa_gbr.iterrows():
    # get the ratings of row from ratings_of_cusips
    ratings_of_row =  ratings_of_cusips.loc[ratings_of_cusips['cusip_id'] == row[1]['CUSIP_ID']]
    # get trd_exctn_dt of the row as "date_of_row"
    date_of_row = row[1]['trd_exctn_dt']
    # locate the latest rating
    ind_to_insert = ratings_of_row['rating_date'].sort_values().searchsorted(date_of_row)
    # get the latest rating from "ratings_of_row"
    rating_of_row = ratings_of_row.iloc[ind_to_insert - 1]['rating']
    # get the rating type of latest rating from "ratings_of_row"
    ratetype_of_row = ratings_of_row.iloc[ind_to_insert - 1]['rating_type']
    # get the rating date of latest rating from "ratings_of_row"
    ratedate_of_row = ratings_of_row.iloc[ind_to_insert - 1]['rating_date']
    # get cusip_id of the row
    cusip_of_row = row[1]['CUSIP_ID']
    # get the date of the row
    trd_exctn_dt = row[1]['trd_exctn_dt']
    # country of row
    country = row[1]['country_domicile']
    ratings.append({'cusip' : cusip_of_row , 'trd_exctn_dt': trd_exctn_dt, 'country': country, 'rating type': ratetype_of_row,
                    'rateing date': ratedate_of_row,  'rating' : rating_of_row})
# check results
# the length of ratings list is equal to the length of monthly yield dataframe
len(ratings)
11177
ratings[0]
{'cusip': '00077TAA2',
 'trd_exctn_dt': Timestamp('2015-01-26 00:00:00'),
 'country': 'USA',
 'rating type': 'DPR',
 'rateing date': Timestamp('1993-05-20 00:00:00'),
 'rating': 'NR'}
# write results
import csv

with open('../data/export/ratings_with_country.csv', 'w') as f:  # You will need 'wb' mode in Python 2.x
    for i, d in enumerate(ratings):
        w = csv.DictWriter(f, d.keys())
        if i == 0:
            w.writeheader()
        w.writerow(d)

💲5. Compute monthly credit

5.1 Time to maturity

Credit spread

譯自英文-

在金融中,信用利差或淨信用利差是一種期權策略,它涉及購買一種期權和出售另一種期權,它們具有相同的等級和到期日,但執行價格不同。它的目的是賺取利潤時,這兩個選項之間的價差縮小。投資者在進入頭寸時獲得淨信用,並希望價差縮小或到期以獲取利潤。相反,投資者必須付費才能輸入債務價差。

维基百科(英文)

What Is a Maturity Date?

The maturity date is the date on which the principal amount of a note, draft, acceptance bond or other debt instrument becomes due. On this date, which is generally printed on the certificate of the instrument in question, the principal investment is repaid to the investor, while the interest payments that were regularly paid out during the life of the bond, cease to roll in. The maturity date also refers to the termination date (due date) on which an installment loan must be paid back in full.

ref: https://www.investopedia.com/terms/m/maturitydate.asp

# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file, usecols=[0, 3, 7])
cusips
cusip_id maturity country_domicile
0 00077TAA2 2023-05-15 USA
1 00077TAB0 2093-10-15 USA
2 001192AA1 2011-01-14 USA
3 00163XAM2 2013-08-15 USA
4 00206RBS0 2016-02-12 USA
... ... ... ...
1055 961548AQ7 2027-03-15 USA
1056 961548AS3 2027-06-15 USA
1057 984121CK7 2020-09-01 USA
1058 98934KAB6 2023-11-15 USA
1059 989701AU1 2014-05-15 USA

1060 rows × 3 columns

# the bond transaction data from Part1(a)
yields_median
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month cusip_id
1198533 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 00077TAA2
1198535 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 00077TAA2
111592 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 00077TAA2
111594 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 00077TAA2
111597 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 00077TAA2
... ... ... ... ... ... ...
56809 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 98934KAB6
56840 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 98934KAB6
56855 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 98934KAB6
56862 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 98934KAB6
56881 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 98934KAB6

11177 rows × 6 columns

yields_median['cusip_id'] = yields_median['CUSIP_ID']
yields_median
<ipython-input-278-c026565aa96d>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  yields_median['cusip_id'] = yields_median['CUSIP_ID']
CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month cusip_id
1198533 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 00077TAA2
1198535 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 00077TAA2
111592 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 00077TAA2
111594 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 00077TAA2
111597 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 00077TAA2
... ... ... ... ... ... ...
56809 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 98934KAB6
56840 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 98934KAB6
56855 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 98934KAB6
56862 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 98934KAB6
56881 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 98934KAB6

11177 rows × 6 columns

# merge two dataframe by inner join
transactions_maturity = pd.merge(cusips, yields_median, how='inner', on ='cusip_id')
transactions_maturity
cusip_id maturity country_domicile CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month
0 00077TAA2 2023-05-15 USA 00077TAA2 2015-01-26 16:01:58 NaN 2015-01
1 00077TAA2 2023-05-15 USA 00077TAA2 2015-03-24 16:37:01 NaN 2015-03
2 00077TAA2 2023-05-15 USA 00077TAA2 2015-05-01 11:03:37 NaN 2015-05
3 00077TAA2 2023-05-15 USA 00077TAA2 2015-07-27 9:14:28 NaN 2015-07
4 00077TAA2 2023-05-15 USA 00077TAA2 2015-08-20 12:06:06 NaN 2015-08
... ... ... ... ... ... ... ... ...
11172 98934KAB6 2023-11-15 USA 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08
11173 98934KAB6 2023-11-15 USA 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09
11174 98934KAB6 2023-11-15 USA 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10
11175 98934KAB6 2023-11-15 USA 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11
11176 98934KAB6 2023-11-15 USA 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12

11177 rows × 8 columns

transactions_maturity.dtypes
cusip_id                    object
maturity                    object
country_domicile            object
CUSIP_ID                    object
trd_exctn_dt        datetime64[ns]
trd_exctn_tm                object
yld_pt                     float64
year_month               period[M]
dtype: object
# transform maturity to datetime
transactions_maturity['maturity'] = pd.to_datetime(transactions_maturity['maturity'])
transactions_maturity.dtypes
cusip_id                    object
maturity            datetime64[ns]
country_domicile            object
CUSIP_ID                    object
trd_exctn_dt        datetime64[ns]
trd_exctn_tm                object
yld_pt                     float64
year_month               period[M]
dtype: object
# get 'time to maturity' of dt.days type
transactions_maturity['time_to_maturity'] = transactions_maturity['maturity'] - transactions_maturity['trd_exctn_dt']
# transform to int type
transactions_maturity['time_to_maturity'] = transactions_maturity['time_to_maturity'].dt.days
# transform to year
transactions_maturity['time_to_maturity'] = transactions_maturity['time_to_maturity'] / 365.25
# round to 2 decimals
transactions_maturity['time_to_maturity'] = np.around(transactions_maturity['time_to_maturity'], 2)
transactions_maturity
cusip_id maturity country_domicile CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month time_to_maturity
0 00077TAA2 2023-05-15 USA 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 8.30
1 00077TAA2 2023-05-15 USA 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 8.14
2 00077TAA2 2023-05-15 USA 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 8.04
3 00077TAA2 2023-05-15 USA 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 7.80
4 00077TAA2 2023-05-15 USA 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 7.73
... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2023-11-15 USA 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 6.21
11173 98934KAB6 2023-11-15 USA 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 6.14
11174 98934KAB6 2023-11-15 USA 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 6.06
11175 98934KAB6 2023-11-15 USA 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 5.98
11176 98934KAB6 2023-11-15 USA 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 5.91

11177 rows × 9 columns

5.2 Zero-coupon yield

百度百科:零息债券收益率(Zero-coupon yield)

transactions_maturity
cusip_id maturity country_domicile CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month time_to_maturity
0 00077TAA2 2023-05-15 USA 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 8.30
1 00077TAA2 2023-05-15 USA 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 8.14
2 00077TAA2 2023-05-15 USA 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 8.04
3 00077TAA2 2023-05-15 USA 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 7.80
4 00077TAA2 2023-05-15 USA 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 7.73
... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2023-11-15 USA 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 6.21
11173 98934KAB6 2023-11-15 USA 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 6.14
11174 98934KAB6 2023-11-15 USA 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 6.06
11175 98934KAB6 2023-11-15 USA 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 5.98
11176 98934KAB6 2023-11-15 USA 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 5.91

11177 rows × 9 columns

# read zero coupon file
zero_coupon_file = '../data/feds200628.csv'
zero_coupon = pd.read_csv(zero_coupon_file, skiprows=9, usecols=list(range(31)))
# get the columns of the dataframe
zero_coupon.columns
Index(['Unnamed: 0', 'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05',
       'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11',
       'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17',
       'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23',
       'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29',
       'SVENY30'],
      dtype='object')
# rename the unnamed date column
zero_coupon = zero_coupon.rename(columns={'Unnamed: 0': 'date'})
zero_coupon.dtypes
date        object
SVENY01    float64
SVENY02    float64
SVENY03    float64
SVENY04    float64
SVENY05    float64
SVENY06    float64
SVENY07    float64
SVENY08    float64
SVENY09    float64
SVENY10    float64
SVENY11    float64
SVENY12    float64
SVENY13    float64
SVENY14    float64
SVENY15    float64
SVENY16    float64
SVENY17    float64
SVENY18    float64
SVENY19    float64
SVENY20    float64
SVENY21    float64
SVENY22    float64
SVENY23    float64
SVENY24    float64
SVENY25    float64
SVENY26    float64
SVENY27    float64
SVENY28    float64
SVENY29    float64
SVENY30    float64
dtype: object
zero_coupon['date'] = pd.to_datetime(zero_coupon['date'])
zero_coupon.dtypes
date       datetime64[ns]
SVENY01           float64
SVENY02           float64
SVENY03           float64
SVENY04           float64
SVENY05           float64
SVENY06           float64
SVENY07           float64
SVENY08           float64
SVENY09           float64
SVENY10           float64
SVENY11           float64
SVENY12           float64
SVENY13           float64
SVENY14           float64
SVENY15           float64
SVENY16           float64
SVENY17           float64
SVENY18           float64
SVENY19           float64
SVENY20           float64
SVENY21           float64
SVENY22           float64
SVENY23           float64
SVENY24           float64
SVENY25           float64
SVENY26           float64
SVENY27           float64
SVENY28           float64
SVENY29           float64
SVENY30           float64
dtype: object
transactions_maturity.dtypes
cusip_id                    object
maturity            datetime64[ns]
country_domicile            object
CUSIP_ID                    object
trd_exctn_dt        datetime64[ns]
trd_exctn_tm                object
yld_pt                     float64
year_month               period[M]
time_to_maturity           float64
dtype: object
# transactions_maturity['zero_coupon'] =
# loc = zero_coupon.loc[zero_coupon['date'] == transactions_maturity['trd_exctn_dt']][round(transactions_maturity['time_to_maturity'])]
# loc = zero_coupon.loc[zero_coupon['date'] == transactions_maturity['trd_exctn_dt']]
# zero_coupon['date']
# transactions_maturity['trd_exctn_dt']
# merge two dataframe by left join
transactions_zerocoupon = pd.merge(transactions_maturity, zero_coupon, how='left', right_on ='date', left_on='trd_exctn_dt')
# sort rows
transactions_zerocoupon.sort_values(by=['cusip_id', 'year_month'])
cusip_id maturity country_domicile CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month time_to_maturity date ... SVENY21 SVENY22 SVENY23 SVENY24 SVENY25 SVENY26 SVENY27 SVENY28 SVENY29 SVENY30
0 00077TAA2 2023-05-15 USA 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 8.30 2015-01-26 ... 2.3417 2.3649 2.3866 2.4067 2.4255 2.4430 2.4594 2.4748 2.4893 2.5028
1 00077TAA2 2023-05-15 USA 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 8.14 2015-03-24 ... 2.4235 2.4491 2.4731 2.4957 2.5168 2.5367 2.5553 2.5729 2.5895 2.6052
2 00077TAA2 2023-05-15 USA 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 8.04 2015-05-01 ... 2.7832 2.8154 2.8459 2.8746 2.9018 2.9276 2.9520 2.9751 2.9970 3.0179
3 00077TAA2 2023-05-15 USA 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 7.80 2015-07-27 ... 2.8422 2.8854 2.9288 2.9722 3.0157 3.0591 3.1025 3.1456 3.1886 3.2312
4 00077TAA2 2023-05-15 USA 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 7.73 2015-08-20 ... 2.6613 2.6985 2.7355 2.7723 2.8089 2.8453 2.8815 2.9173 2.9529 2.9880
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2023-11-15 USA 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 6.21 2017-08-30 ... 2.6519 2.6866 2.7208 2.7544 2.7875 2.8199 2.8517 2.8829 2.9135 2.9434
11173 98934KAB6 2023-11-15 USA 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 6.14 2017-09-26 ... 2.6849 2.7176 2.7500 2.7821 2.8139 2.8452 2.8762 2.9066 2.9366 2.9661
11174 98934KAB6 2023-11-15 USA 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 6.06 2017-10-24 ... 2.8288 2.8605 2.8922 2.9239 2.9557 2.9874 3.0191 3.0506 3.0819 3.1129
11175 98934KAB6 2023-11-15 USA 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 5.98 2017-11-22 ... 2.6742 2.6985 2.7227 2.7467 2.7706 2.7944 2.8180 2.8414 2.8646 2.8877
11176 98934KAB6 2023-11-15 USA 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 5.91 2017-12-19 ... 2.7453 2.7672 2.7896 2.8125 2.8357 2.8593 2.8833 2.9075 2.9320 2.9566

11177 rows × 40 columns

# test format
year = f'SVENY{round(8.30):02d}'
year
'SVENY08'
transactions_zerocoupon['zero_coupon'] = transactions_zerocoupon[f"SVENY{round(transactions_zerocoupon['time_to_maturity']):02d}"]
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-298-066c4c5b005f> in <module>
----> 1 transactions_zerocoupon['zero_coupon'] = transactions_zerocoupon[f"SVENY{round(transactions_zerocoupon['time_to_maturity']):02d}"]


TypeError: unsupported format string passed to Series.__format__
transactions_zerocoupon.columns
Index(['cusip_id', 'maturity', 'country_domicile', 'CUSIP_ID', 'trd_exctn_dt',
       'trd_exctn_tm', 'yld_pt', 'year_month', 'time_to_maturity', 'date',
       'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05', 'SVENY06',
       'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11', 'SVENY12',
       'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17', 'SVENY18',
       'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23', 'SVENY24',
       'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29', 'SVENY30'],
      dtype='object')
transactions_zerocoupon[transactions_zerocoupon.columns[10]]
0        0.2029
1        0.3018
2        0.2832
3        0.3333
4        0.4325
          ...  
11172    1.2249
11173    1.3007
11174    1.4183
11175    1.5927
11176    1.7084
Name: SVENY01, Length: 11177, dtype: float64
transactions_zerocoupon['time_to_maturity'].max()
82.33
# round to the closed year
transactions_zerocoupon['year'] = transactions_zerocoupon['time_to_maturity'].transform(lambda x : 30 if x > 29 else (x + 1 if x == 0 or not x.is_integer() else x))
# some test samples of the above lambda function
'''

x      function(x)

0      1
0.3    1.3
0.5    1.5
1      1
1.3    2.3
2      2
5.     5
9.04   10.04
29     29
29.1   30
30     30
31     30

'''
'\n\nx      function(x)\n\n0      1\n0.3    1.3\n0.5    1.5\n1      1\n1.3    2.3\n2      2\n5.     5\n9.04   10.04\n29     29\n29.1   30\n30     30\n31     30\n\n'
transactions_zerocoupon['year'].min()
1.0
# transactions_zerocoupon['year'] = transactions_zerocoupon['year'].transform(lambda x: 1 if x < 0 else x)

transactions_zerocoupon
cusip_id maturity country_domicile CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month time_to_maturity date ... SVENY22 SVENY23 SVENY24 SVENY25 SVENY26 SVENY27 SVENY28 SVENY29 SVENY30 year
0 00077TAA2 2023-05-15 USA 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 8.30 2015-01-26 ... 2.3649 2.3866 2.4067 2.4255 2.4430 2.4594 2.4748 2.4893 2.5028 9.30
1 00077TAA2 2023-05-15 USA 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 8.14 2015-03-24 ... 2.4491 2.4731 2.4957 2.5168 2.5367 2.5553 2.5729 2.5895 2.6052 9.14
2 00077TAA2 2023-05-15 USA 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 8.04 2015-05-01 ... 2.8154 2.8459 2.8746 2.9018 2.9276 2.9520 2.9751 2.9970 3.0179 9.04
3 00077TAA2 2023-05-15 USA 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 7.80 2015-07-27 ... 2.8854 2.9288 2.9722 3.0157 3.0591 3.1025 3.1456 3.1886 3.2312 8.80
4 00077TAA2 2023-05-15 USA 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 7.73 2015-08-20 ... 2.6985 2.7355 2.7723 2.8089 2.8453 2.8815 2.9173 2.9529 2.9880 8.73
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2023-11-15 USA 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 6.21 2017-08-30 ... 2.6866 2.7208 2.7544 2.7875 2.8199 2.8517 2.8829 2.9135 2.9434 7.21
11173 98934KAB6 2023-11-15 USA 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 6.14 2017-09-26 ... 2.7176 2.7500 2.7821 2.8139 2.8452 2.8762 2.9066 2.9366 2.9661 7.14
11174 98934KAB6 2023-11-15 USA 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 6.06 2017-10-24 ... 2.8605 2.8922 2.9239 2.9557 2.9874 3.0191 3.0506 3.0819 3.1129 7.06
11175 98934KAB6 2023-11-15 USA 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 5.98 2017-11-22 ... 2.6985 2.7227 2.7467 2.7706 2.7944 2.8180 2.8414 2.8646 2.8877 6.98
11176 98934KAB6 2023-11-15 USA 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 5.91 2017-12-19 ... 2.7672 2.7896 2.8125 2.8357 2.8593 2.8833 2.9075 2.9320 2.9566 6.91

11177 rows × 41 columns

# 事先创建“zero_coupon”列,来避免“Wrong number of items passed 11177, placement implies 1”错误
transactions_zerocoupon['zero_coupon'] = 0
# For each bond issue and trade, locate the zero-coupon yield of US treasury with the time to maturity closest to the bond issue’s.
transactions_zerocoupon['zero_coupon'] = transactions_zerocoupon[transactions_zerocoupon.columns[transactions_zerocoupon['year'].astype(int) + 8]]
transactions_zerocoupon['year']
0        9.30
1        9.14
2        9.04
3        8.80
4        8.73
         ... 
11172    7.21
11173    7.14
11174    7.06
11175    6.98
11176    6.91
Name: year, Length: 11177, dtype: float64
5.3 Credit spreads
# take the difference between the yields of the bond issue and the zero-coupon yield and the result is the credit spreads.
transactions_zerocoupon['credit_spreads'] = transactions_zerocoupon['yld_pt'] - transactions_zerocoupon['zero_coupon']
transactions_zerocoupon
cusip_id maturity country_domicile CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month time_to_maturity date ... SVENY24 SVENY25 SVENY26 SVENY27 SVENY28 SVENY29 SVENY30 year zero_coupon credit_spreads
0 00077TAA2 2023-05-15 USA 00077TAA2 2015-01-26 16:01:58 NaN 2015-01 8.30 2015-01-26 ... 2.4067 2.4255 2.4430 2.4594 2.4748 2.4893 2.5028 9.30 1.7586 NaN
1 00077TAA2 2023-05-15 USA 00077TAA2 2015-03-24 16:37:01 NaN 2015-03 8.14 2015-03-24 ... 2.4957 2.5168 2.5367 2.5553 2.5729 2.5895 2.6052 9.14 1.7968 NaN
2 00077TAA2 2023-05-15 USA 00077TAA2 2015-05-01 11:03:37 NaN 2015-05 8.04 2015-05-01 ... 2.8746 2.9018 2.9276 2.9520 2.9751 2.9970 3.0179 9.04 2.0255 NaN
3 00077TAA2 2023-05-15 USA 00077TAA2 2015-07-27 9:14:28 NaN 2015-07 7.80 2015-07-27 ... 2.9722 3.0157 3.0591 3.1025 3.1456 3.1886 3.2312 8.80 2.1064 NaN
4 00077TAA2 2023-05-15 USA 00077TAA2 2015-08-20 12:06:06 NaN 2015-08 7.73 2015-08-20 ... 2.7723 2.8089 2.8453 2.8815 2.9173 2.9529 2.9880 8.73 1.987 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2023-11-15 USA 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 6.21 2017-08-30 ... 2.7544 2.7875 2.8199 2.8517 2.8829 2.9135 2.9434 7.21 2.0507 0.67949
11173 98934KAB6 2023-11-15 USA 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 6.14 2017-09-26 ... 2.7821 2.8139 2.8452 2.8762 2.9066 2.9366 2.9661 7.14 2.1521 0.70636
11174 98934KAB6 2023-11-15 USA 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 6.06 2017-10-24 ... 2.9239 2.9557 2.9874 3.0191 3.0506 3.0819 3.1129 7.06 2.3256 0.629487
11175 98934KAB6 2023-11-15 USA 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 5.98 2017-11-22 ... 2.7467 2.7706 2.7944 2.8180 2.8414 2.8646 2.8877 6.98 2.2634 0.672886
11176 98934KAB6 2023-11-15 USA 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 5.91 2017-12-19 ... 2.8125 2.8357 2.8593 2.8833 2.9075 2.9320 2.9566 6.91 2.41 0.550962

11177 rows × 43 columns

# drop掉包含nan的行
# transactions_zerocoupon = transactions_zerocoupon.dropna()
transactions_zerocoupon = transactions_zerocoupon[transactions_zerocoupon['zero_coupon'].notna()]
transactions_zerocoupon = transactions_zerocoupon[transactions_zerocoupon['yld_pt'].notna()]



transactions_zerocoupon
cusip_id maturity country_domicile CUSIP_ID trd_exctn_dt trd_exctn_tm yld_pt year_month time_to_maturity date ... SVENY24 SVENY25 SVENY26 SVENY27 SVENY28 SVENY29 SVENY30 year zero_coupon credit_spreads
21 00077TAB0 2093-10-15 USA 00077TAB0 2015-01-05 14:27:45 6.525827 2015-01 78.78 2015-01-05 ... 2.5902 2.6174 2.6438 2.6694 2.6942 2.7183 2.7415 30.00 1.964 4.56183
22 00077TAB0 2093-10-15 USA 00077TAB0 2015-02-17 11:24:15 6.897267 2015-02 78.66 2015-02-17 ... 2.7596 2.7781 2.7953 2.8114 2.8264 2.8405 2.8537 30.00 2.0779 4.81937
23 00077TAB0 2093-10-15 USA 00077TAB0 2015-03-31 14:59:18 6.816467 2015-03 78.54 2015-03-31 ... 2.5792 2.6016 2.6228 2.6427 2.6616 2.6794 2.6963 30.00 1.8462 4.97027
24 00077TAB0 2093-10-15 USA 00077TAB0 2015-04-24 12:10:55 6.029157 2015-04 78.48 2015-04-24 ... 2.6637 2.6893 2.7135 2.7363 2.7578 2.7781 2.7972 30.00 1.8252 4.20396
25 00077TAB0 2093-10-15 USA 00077TAB0 2015-06-04 11:17:07 6.404942 2015-06 78.37 2015-06-04 ... 3.1048 3.1394 3.1731 3.2059 3.2379 3.2689 3.2991 30.00 2.2132 4.19174
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11172 98934KAB6 2023-11-15 USA 98934KAB6 2017-08-30 10:53:27 2.730190 2017-08 6.21 2017-08-30 ... 2.7544 2.7875 2.8199 2.8517 2.8829 2.9135 2.9434 7.21 2.0507 0.67949
11173 98934KAB6 2023-11-15 USA 98934KAB6 2017-09-26 13:56:25 2.858460 2017-09 6.14 2017-09-26 ... 2.7821 2.8139 2.8452 2.8762 2.9066 2.9366 2.9661 7.14 2.1521 0.70636
11174 98934KAB6 2023-11-15 USA 98934KAB6 2017-10-24 12:57:12 2.955087 2017-10 6.06 2017-10-24 ... 2.9239 2.9557 2.9874 3.0191 3.0506 3.0819 3.1129 7.06 2.3256 0.629487
11175 98934KAB6 2023-11-15 USA 98934KAB6 2017-11-22 9:27:49 2.936286 2017-11 5.98 2017-11-22 ... 2.7467 2.7706 2.7944 2.8180 2.8414 2.8646 2.8877 6.98 2.2634 0.672886
11176 98934KAB6 2023-11-15 USA 98934KAB6 2017-12-19 12:22:55 2.960962 2017-12 5.91 2017-12-19 ... 2.8125 2.8357 2.8593 2.8833 2.9075 2.9320 2.9566 6.91 2.41 0.550962

7548 rows × 43 columns

transactions_zerocoupon.to_csv('../data/export/part1e.csv')