【数据分析】基于pandas的wrds数据库金融数据分析
AcF 351b Career Skills in Accounting and Finance
Python for Data Analysis
Stream Assignment Part 1
给定美国债券市场部分英美企业债券发行的信息,通过下载分析WRDS数据库的金融和账务数据,评估英国脱欧对公司债券价格的影响。
💲1. Download and process bond fields data
The bond yield data are downloaded from TRACE on WRDS using WRDS web.
The downloaded file is saved as ‘e792b11838182376.csv’.
Now process the data in the file…
import pandas as pd
# check pandas version
print(pd.__version__)
1.1.3
# read csv as pandas dataframe
trace_on_wrds = '../data/e792b11838182376.csv'
yields = pd.read_csv(trace_on_wrds, usecols=[0, 3, 4, 5], parse_dates=[1])
# show dataframe
yields
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | |
---|---|---|---|---|
0 | 037833BC3 | 2015-05-07 | 7:57:03 | NaN |
1 | 037833BC3 | 2015-05-07 | 12:29:00 | NaN |
2 | 037833BC3 | 2015-05-07 | 12:29:00 | NaN |
3 | 037833BC3 | 2015-05-07 | 12:34:48 | NaN |
4 | 037833BC3 | 2015-05-07 | 12:37:44 | NaN |
... | ... | ... | ... | ... |
1326280 | 984121CK7 | 2017-12-28 | 12:08:56 | NaN |
1326281 | 984121CK7 | 2017-12-28 | 12:08:56 | NaN |
1326282 | 984121CK7 | 2017-12-28 | 12:08:57 | NaN |
1326283 | 984121CK7 | 2017-12-29 | 8:00:11 | NaN |
1326284 | 984121CK7 | 2017-12-29 | 8:00:11 | NaN |
1326285 rows × 4 columns
# extract month and year from the date field
yields['year_month'] = yields['trd_exctn_dt'].dt.to_period('M')
yields
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | |
---|---|---|---|---|---|
0 | 037833BC3 | 2015-05-07 | 7:57:03 | NaN | 2015-05 |
1 | 037833BC3 | 2015-05-07 | 12:29:00 | NaN | 2015-05 |
2 | 037833BC3 | 2015-05-07 | 12:29:00 | NaN | 2015-05 |
3 | 037833BC3 | 2015-05-07 | 12:34:48 | NaN | 2015-05 |
4 | 037833BC3 | 2015-05-07 | 12:37:44 | NaN | 2015-05 |
... | ... | ... | ... | ... | ... |
1326280 | 984121CK7 | 2017-12-28 | 12:08:56 | NaN | 2017-12 |
1326281 | 984121CK7 | 2017-12-28 | 12:08:56 | NaN | 2017-12 |
1326282 | 984121CK7 | 2017-12-28 | 12:08:57 | NaN | 2017-12 |
1326283 | 984121CK7 | 2017-12-29 | 8:00:11 | NaN | 2017-12 |
1326284 | 984121CK7 | 2017-12-29 | 8:00:11 | NaN | 2017-12 |
1326285 rows × 5 columns
# sort dataframe first by bond issue, then by trade date, and then by trade time in ascending order.
yields = yields.sort_values(by=['CUSIP_ID', 'trd_exctn_dt', 'trd_exctn_tm'], ascending=True)
yields[64:80]
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | |
---|---|---|---|---|---|
1198536 | 00077TAB0 | 2015-01-05 | 14:27:45 | 6.608047 | 2015-01 |
1198537 | 00077TAB0 | 2015-01-05 | 14:27:45 | 6.525827 | 2015-01 |
1198538 | 00077TAB0 | 2015-01-06 | 10:23:34 | 6.842175 | 2015-01 |
1198539 | 00077TAB0 | 2015-01-06 | 10:23:34 | 6.786439 | 2015-01 |
1198540 | 00077TAB0 | 2015-02-17 | 11:24:15 | 7.002821 | 2015-02 |
1198541 | 00077TAB0 | 2015-02-17 | 11:24:15 | 6.897267 | 2015-02 |
111651 | 00077TAB0 | 2015-03-31 | 14:47:54 | 6.758606 | 2015-03 |
111652 | 00077TAB0 | 2015-03-31 | 14:59:18 | 6.816467 | 2015-03 |
111653 | 00077TAB0 | 2015-03-31 | 14:59:18 | 6.816467 | 2015-03 |
111654 | 00077TAB0 | 2015-04-24 | 12:10:55 | 6.029157 | 2015-04 |
111655 | 00077TAB0 | 2015-04-24 | 12:10:55 | 6.029157 | 2015-04 |
111656 | 00077TAB0 | 2015-06-04 | 11:17:07 | 6.404942 | 2015-06 |
111657 | 00077TAB0 | 2015-06-10 | 11:09:56 | 6.492385 | 2015-06 |
111658 | 00077TAB0 | 2015-06-10 | 11:09:56 | 6.489400 | 2015-06 |
111659 | 00077TAB0 | 2015-06-10 | 11:09:56 | 6.484927 | 2015-06 |
111660 | 00077TAB0 | 2015-07-07 | 14:35:19 | 6.901996 | 2015-07 |
# for each bond issue in every trade day, we only keep the last yield that occurs in the day.
yields = yields.drop_duplicates(subset=['CUSIP_ID', 'trd_exctn_dt'], keep='last')
yields
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | |
---|---|---|---|---|---|
1198533 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 |
1198534 | 00077TAA2 | 2015-01-27 | 13:37:06 | NaN | 2015-01 |
1198535 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 |
111591 | 00077TAA2 | 2015-03-26 | 13:50:47 | NaN | 2015-03 |
111592 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 |
... | ... | ... | ... | ... | ... |
56862 | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 |
56866 | 98934KAB6 | 2017-11-28 | 8:08:18 | 2.865028 | 2017-11 |
56877 | 98934KAB6 | 2017-11-30 | 17:23:00 | 2.971758 | 2017-11 |
56879 | 98934KAB6 | 2017-12-04 | 9:27:55 | 3.009462 | 2017-12 |
56881 | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 |
149595 rows × 5 columns
# sort yields first by cusip, then by year_month, and then by yld_pt
yields = yields.sort_values(by=['CUSIP_ID', 'year_month', 'yld_pt'])
# check results
yields[0:10]
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | |
---|---|---|---|---|---|
1198533 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 |
1198534 | 00077TAA2 | 2015-01-27 | 13:37:06 | NaN | 2015-01 |
1198535 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 |
111591 | 00077TAA2 | 2015-03-26 | 13:50:47 | NaN | 2015-03 |
111592 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 |
111593 | 00077TAA2 | 2015-07-21 | 11:43:45 | NaN | 2015-07 |
111594 | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 |
111595 | 00077TAA2 | 2015-07-30 | 8:12:42 | NaN | 2015-07 |
111597 | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 |
111598 | 00077TAA2 | 2015-09-17 | 15:45:59 | NaN | 2015-09 |
# group yields by cysip and year_month, then get the indices of groups
yields_g = yields.groupby(['CUSIP_ID', 'year_month']).indices
# the indices of group items in source dataframe
list(yields_g.values())[0:10]
[array([0, 1]),
array([2, 3]),
array([4]),
array([5, 6, 7]),
array([8]),
array([9]),
array([10, 11]),
array([12, 13]),
array([14]),
array([15])]
# get the median indices in source dataframe of each group as list
import numpy as np
inds_median = [(sum(x) // len(x)) for x in list(yields_g.values())]
inds_median[0:10]
[0, 2, 4, 6, 8, 9, 10, 12, 14, 15]
# locate the yields data of each group in the source dataframe
# 这种方式获取的中位数不是算术意义的中位数
# 对于偶数个序列,算术中位数是求中间两位的平均值,而这种方式只是返回中间两位中任意一位的值。
yields_median = yields.iloc[inds_median]
# 这里得到的结果和FAQ中给的期望结果有些差别,但是也是OK的。
yields_median
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | |
---|---|---|---|---|---|
1198533 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 |
1198535 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 |
111592 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 |
111594 | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 |
111597 | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 |
... | ... | ... | ... | ... | ... |
56809 | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 |
56840 | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 |
56855 | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 |
56862 | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 |
56881 | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 |
11177 rows × 5 columns
# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file,)
cusips
cusip_id | offering_amt | offering_date | maturity | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.7500 | N | N | N | N | 5.0 |
1 | 00077TAB0 | 150000.0 | 1993-10-13 | 2093-10-15 | CDEB | SUB | N | USA | 7.1250 | N | N | N | N | 6.0 |
2 | 001192AA1 | 300000.0 | 2001-02-23 | 2011-01-14 | CDEB | SEN | N | USA | 7.1250 | N | N | N | N | 117277.0 |
3 | 00163XAM2 | 175000.0 | 2006-08-10 | 2013-08-15 | CMTN | SEN | N | USA | 5.9000 | N | N | N | N | 351976.0 |
4 | 00206RBS0 | 1250000.0 | 2013-02-07 | 2016-02-12 | CDEB | SEN | N | USA | 0.7411 | N | N | N | N | 589049.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1055 | 961548AQ7 | 150000.0 | 1997-03-12 | 2027-03-15 | CDEB | SEN | N | USA | 7.6500 | N | N | N | N | 51090.0 |
1056 | 961548AS3 | 150000.0 | 1997-06-19 | 2027-06-15 | CDEB | SEN | N | USA | 7.5000 | N | N | N | N | 56219.0 |
1057 | 984121CK7 | 400000.0 | 2015-02-26 | 2020-09-01 | CDEB | SEN | N | USA | 2.7500 | N | N | N | N | 626119.0 |
1058 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.0000 | N | N | N | N | 24061.0 |
1059 | 989701AU1 | 34680.0 | 2009-06-01 | 2014-05-15 | CDEB | SUB | N | USA | 5.6500 | N | N | N | N | 498722.0 |
1060 rows × 14 columns
yields_median_usa_gbr = pd.merge(yields_median, cusips, how='inner', right_on ='cusip_id', left_on='CUSIP_ID')
yields_median_usa_gbr
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | cusip_id | offering_amt | offering_date | maturity | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
1 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
2 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
3 | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
4 | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11173 | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11174 | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11175 | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11176 | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11177 rows × 19 columns
# export
part1a_median_file = '../data/export/part1a_median_file.csv'
yields_median_usa_gbr.to_csv(part1a_median_file)
# # get median of each month using groupby and a lambda function
# import warnings
# with warnings.catch_warnings():
# warnings.simplefilter("ignore", category=RuntimeWarning) # ignore RuntimeWarning
# yields['median'] = yields.groupby(['CUSIP_ID', 'year_month'])[['yld_pt']].transform(lambda x: x.median())
# # check results of 00077TAB0
# yields[40:60]
# yields = yields.sort_values(by=['CUSIP_ID', 'year_month', 'yld_pt'])
# # check results of 00077TAB0
# yields[40:60]
💲2. CRSP on WRDS and 180-day stock return volat
ref: https://support.eventstudy.com/hc/en-us/articles/212258983-CUSIP-6-8-or-9-characters-
CUSIP, which is controlled by Standard & Poor’s, is officially a 9-character string. The first 6 identify the issuer, the 7th and 8th identify the issue and the 9th is a check digit that is fully determined by the first 8 characters.
CRSP (the default data source for Eventus on WRDS and, when subscribed and installed, for Eventus for Windows and Linux) omits the check digit and uses an 8-character CUSIP field.
Common equity issues always have an 8th character of 0. As CRSP covers only common stocks, all CUSIPs in CRSP end in 0 as the 8th character. CUSIPs from other sources that have a non-zero digit in the 8th position are not common stock CUSIPs.
# link crsp permco/permno from cusip_ids
import pandas as pd
links = '../data/crsp_cusip_link.csv'
links_data = pd.read_csv(links)
links_data
CUSIP | PERMNO | PERMCO | trace_startdt | trace_enddt | crsp_startdt | crsp_enddt | link_startdt | link_enddt | |
---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 84766 | 31989 | 20020711 | 20110318 | 20020102 | 20080424 | 20020711 | 20080424 |
1 | 00077TAA2 | 92355 | 28711 | 20120614 | 20150324 | 20071018 | 20200722 | 20120614 | 20150324 |
2 | 00077TAB0 | 84766 | 31989 | 20020906 | 20110328 | 20020102 | 20080424 | 20020906 | 20080424 |
3 | 00077TAB0 | 92355 | 28711 | 20120607 | 20150217 | 20071018 | 20200722 | 20120607 | 20150217 |
4 | 001192AA1 | 15553 | 116 | 20021206 | 20101216 | 20020102 | 20090208 | 20021206 | 20090208 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1124 | 961548AS3 | 21186 | 21913 | 20020726 | 20141231 | 20020130 | 20150701 | 20020726 | 20141231 |
1125 | 961548AS3 | 21186 | 21913 | 20151112 | 20170524 | 20150702 | 20200930 | 20151112 | 20170524 |
1126 | 984121CK7 | 27983 | 21945 | 20150226 | 20200828 | 20020102 | 20200930 | 20150226 | 20200828 |
1127 | 98934KAB6 | 79363 | 29908 | 20020702 | 20200923 | 20020102 | 20200930 | 20020702 | 20200923 |
1128 | 989701AU1 | 84129 | 5057 | 20090731 | 20140414 | 19821101 | 20200930 | 20090731 | 20140414 |
1129 rows × 9 columns
import numpy as np
# read csv as pandas dataframe
# 下载数据从2014年6月开始。因为要计算180天rolling标准差
# crspa_dsf = '../data/880fcbe48e8d30ce.csv'
# 下载数据从2014年4月开始。因为要计算180天rolling标准差
crspa_dsf = '../data/e862f245484c3340.csv'
# read (PERMNO, date, PERMCO, CUSIP, RET)
dsf = pd.read_csv(crspa_dsf, usecols = [0, 1, 15, 48])
dsf
/Users/jade_mayer/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3146: DtypeWarning: Columns (48) have mixed types.Specify dtype option on import or set low_memory=False.
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
PERMNO | date | PERMCO | RET | |
---|---|---|---|---|
0 | 10145 | 2014/04/15 | 22168 | 0.008689 |
1 | 10145 | 2014/04/16 | 22168 | 0.017337 |
2 | 10145 | 2014/04/17 | 22168 | -0.002144 |
3 | 10145 | 2014/04/21 | 22168 | 0.000859 |
4 | 10145 | 2014/04/22 | 22168 | 0.001717 |
... | ... | ... | ... | ... |
140596 | 93013 | 2017/12/22 | 53204 | 0.001886 |
140597 | 93013 | 2017/12/26 | 53204 | -0.001345 |
140598 | 93013 | 2017/12/27 | 53204 | 0.006642 |
140599 | 93013 | 2017/12/28 | 53204 | -0.007936 |
140600 | 93013 | 2017/12/29 | 53204 | 0.002696 |
140601 rows × 4 columns
Missing Value Codes in SAS
When looking up a holding period return (RET), or delisting return (DLRET), SAS may return a missing value code. These are indicated with a decimal point followed by a letter (e.g. “.E”, “.D”). The descriptions for the missing value codes in SAS can be found on the “Variable Description” page for CRSP Stock.
# missing value
# B means Off-exchange
subset = dsf.loc[dsf['RET'] == 'B'].count()
subset
PERMNO 0
date 0
PERMCO 0
RET 0
dtype: int64
# missing value
# C means No valid previous price
subset = dsf.loc[dsf['RET'] == 'C'].count()
subset
PERMNO 3
date 3
PERMCO 3
RET 3
dtype: int64
# remove missing data rows
dsf = dsf[dsf.RET != 'C']
# df = df[df.RET != 'B']
dsf
PERMNO | date | PERMCO | RET | |
---|---|---|---|---|
0 | 10145 | 2014/04/15 | 22168 | 0.008689 |
1 | 10145 | 2014/04/16 | 22168 | 0.017337 |
2 | 10145 | 2014/04/17 | 22168 | -0.002144 |
3 | 10145 | 2014/04/21 | 22168 | 0.000859 |
4 | 10145 | 2014/04/22 | 22168 | 0.001717 |
... | ... | ... | ... | ... |
140596 | 93013 | 2017/12/22 | 53204 | 0.001886 |
140597 | 93013 | 2017/12/26 | 53204 | -0.001345 |
140598 | 93013 | 2017/12/27 | 53204 | 0.006642 |
140599 | 93013 | 2017/12/28 | 53204 | -0.007936 |
140600 | 93013 | 2017/12/29 | 53204 | 0.002696 |
140598 rows × 4 columns
# calculate rolling standard deviation of 180-day
# dsf['rolling'] =
rollings = dsf.groupby('PERMNO')['RET'].rolling(180).std()
# rollings_df = pd.DataFrame(data = [rollings.values] * len(rollings), columns = rollings.index)
rollings[(10145, 0)]
nan
dsf.index
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7,
8, 9,
...
140591, 140592, 140593, 140594, 140595, 140596, 140597, 140598,
140599, 140600],
dtype='int64', length=140598)
# join rolling data with source dataframe on (dsf.index, 'PERMNO')
dsf = dsf.merge(rollings.rename('rollings_'), on=[dsf.index, 'PERMNO'] ,left_index=True, right_index=True)
dsf
PERMNO | date | PERMCO | RET | rollings_ | |
---|---|---|---|---|---|
0 | 10145 | 2014/04/15 | 22168 | 0.008689 | NaN |
1 | 10145 | 2014/04/16 | 22168 | 0.017337 | NaN |
2 | 10145 | 2014/04/17 | 22168 | -0.002144 | NaN |
3 | 10145 | 2014/04/21 | 22168 | 0.000859 | NaN |
4 | 10145 | 2014/04/22 | 22168 | 0.001717 | NaN |
... | ... | ... | ... | ... | ... |
140596 | 93013 | 2017/12/22 | 53204 | 0.001886 | 0.010455 |
140597 | 93013 | 2017/12/26 | 53204 | -0.001345 | 0.010445 |
140598 | 93013 | 2017/12/27 | 53204 | 0.006642 | 0.010446 |
140599 | 93013 | 2017/12/28 | 53204 | -0.007936 | 0.010463 |
140600 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 |
140598 rows × 5 columns
dsf_cusip = pd.merge(dsf, links_data, on=['PERMNO'], how='inner')
dsf_cusip
PERMNO | date | PERMCO_x | RET | rollings_ | CUSIP | PERMCO_y | trace_startdt | trace_enddt | crsp_startdt | crsp_enddt | link_startdt | link_enddt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10145 | 2014/04/15 | 22168 | 0.008689 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
1 | 10145 | 2014/04/16 | 22168 | 0.017337 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
2 | 10145 | 2014/04/17 | 22168 | -0.002144 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
3 | 10145 | 2014/04/21 | 22168 | 0.000859 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
4 | 10145 | 2014/04/22 | 22168 | 0.001717 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1001488 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 | 03523TBK3 | 53204 | 20110707 | 20140623 | 20090916 | 20200930 | 20110707 | 20140623 |
1001489 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 | 035240AC4 | 53204 | 20161220 | 20180725 | 20090916 | 20200930 | 20161220 | 20180725 |
1001490 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 | 035242AF3 | 53204 | 20140122 | 20190131 | 20090916 | 20200930 | 20140122 | 20190131 |
1001491 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 | 035242AK2 | 53204 | 20160113 | 20200930 | 20090916 | 20200930 | 20160113 | 20200930 |
1001492 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | 20200930 | 20140122 | 20170117 |
1001493 rows × 13 columns
dsf_cusip.loc[dsf_cusip['PERMNO'] == 10145]
PERMNO | date | PERMCO_x | RET | rollings_ | CUSIP | PERMCO_y | trace_startdt | trace_enddt | crsp_startdt | crsp_enddt | link_startdt | link_enddt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10145 | 2014/04/15 | 22168 | 0.008689 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
1 | 10145 | 2014/04/16 | 22168 | 0.017337 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
2 | 10145 | 2014/04/17 | 22168 | -0.002144 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
3 | 10145 | 2014/04/21 | 22168 | 0.000859 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
4 | 10145 | 2014/04/22 | 22168 | 0.001717 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
931 | 10145 | 2017/12/22 | 22168 | -0.001496 | 0.006667 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
932 | 10145 | 2017/12/26 | 22168 | 0.001759 | 0.006667 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
933 | 10145 | 2017/12/27 | 22168 | 0.001236 | 0.006667 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
934 | 10145 | 2017/12/28 | 22168 | 0.001169 | 0.006601 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
935 | 10145 | 2017/12/29 | 22168 | -0.004996 | 0.006600 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
936 rows × 13 columns
# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file)
cusips
cusip_id | offering_amt | offering_date | maturity | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.7500 | N | N | N | N | 5.0 |
1 | 00077TAB0 | 150000.0 | 1993-10-13 | 2093-10-15 | CDEB | SUB | N | USA | 7.1250 | N | N | N | N | 6.0 |
2 | 001192AA1 | 300000.0 | 2001-02-23 | 2011-01-14 | CDEB | SEN | N | USA | 7.1250 | N | N | N | N | 117277.0 |
3 | 00163XAM2 | 175000.0 | 2006-08-10 | 2013-08-15 | CMTN | SEN | N | USA | 5.9000 | N | N | N | N | 351976.0 |
4 | 00206RBS0 | 1250000.0 | 2013-02-07 | 2016-02-12 | CDEB | SEN | N | USA | 0.7411 | N | N | N | N | 589049.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1055 | 961548AQ7 | 150000.0 | 1997-03-12 | 2027-03-15 | CDEB | SEN | N | USA | 7.6500 | N | N | N | N | 51090.0 |
1056 | 961548AS3 | 150000.0 | 1997-06-19 | 2027-06-15 | CDEB | SEN | N | USA | 7.5000 | N | N | N | N | 56219.0 |
1057 | 984121CK7 | 400000.0 | 2015-02-26 | 2020-09-01 | CDEB | SEN | N | USA | 2.7500 | N | N | N | N | 626119.0 |
1058 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.0000 | N | N | N | N | 24061.0 |
1059 | 989701AU1 | 34680.0 | 2009-06-01 | 2014-05-15 | CDEB | SUB | N | USA | 5.6500 | N | N | N | N | 498722.0 |
1060 rows × 14 columns
dsf_cusip_country = pd.merge(dsf_cusip, cusips, left_on=['CUSIP'], right_on=['cusip_id'], how='inner')
dsf_cusip_country.loc[dsf_cusip['PERMNO'] == 10145]
PERMNO | date | PERMCO_x | RET | rollings_ | CUSIP | PERMCO_y | trace_startdt | trace_enddt | crsp_startdt | ... | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10145 | 2014/04/15 | 22168 | 0.008689 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
1 | 10145 | 2014/04/16 | 22168 | 0.017337 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
2 | 10145 | 2014/04/17 | 22168 | -0.002144 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
3 | 10145 | 2014/04/21 | 22168 | 0.000859 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
4 | 10145 | 2014/04/22 | 22168 | 0.001717 | NaN | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
931 | 10145 | 2017/12/22 | 22168 | -0.001496 | 0.006667 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
932 | 10145 | 2017/12/26 | 22168 | 0.001759 | 0.006667 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
933 | 10145 | 2017/12/27 | 22168 | 0.001236 | 0.006667 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
934 | 10145 | 2017/12/28 | 22168 | 0.001169 | 0.006601 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
935 | 10145 | 2017/12/29 | 22168 | -0.004996 | 0.006600 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
936 rows × 27 columns
dsf_cusip_country_nnan = dsf_cusip_country.dropna()
dsf_cusip_country_nnan
PERMNO | date | PERMCO_x | RET | rollings_ | CUSIP | PERMCO_y | trace_startdt | trace_enddt | crsp_startdt | ... | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
179 | 10145 | 2014/12/30 | 22168 | -0.009344 | 0.010120 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
180 | 10145 | 2014/12/31 | 22168 | -0.007943 | 0.010122 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
181 | 10145 | 2015/01/02 | 22168 | 0.003103 | 0.010046 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
182 | 10145 | 2015/01/05 | 22168 | -0.019056 | 0.010150 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
183 | 10145 | 2015/01/06 | 22168 | -0.002339 | 0.010152 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1001488 | 93013 | 2017/12/22 | 53204 | 0.001886 | 0.010455 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001489 | 93013 | 2017/12/26 | 53204 | -0.001345 | 0.010445 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001490 | 93013 | 2017/12/27 | 53204 | 0.006642 | 0.010446 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001491 | 93013 | 2017/12/28 | 53204 | -0.007936 | 0.010463 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001492 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
786357 rows × 27 columns
dsf_cusip_country_ = dsf_cusip_country_nnan.drop_duplicates(subset=['cusip_id', 'date'], keep='last')
dsf_cusip_country_
PERMNO | date | PERMCO_x | RET | rollings_ | CUSIP | PERMCO_y | trace_startdt | trace_enddt | crsp_startdt | ... | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
179 | 10145 | 2014/12/30 | 22168 | -0.009344 | 0.010120 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
180 | 10145 | 2014/12/31 | 22168 | -0.007943 | 0.010122 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
181 | 10145 | 2015/01/02 | 22168 | 0.003103 | 0.010046 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
182 | 10145 | 2015/01/05 | 22168 | -0.019056 | 0.010150 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
183 | 10145 | 2015/01/06 | 22168 | -0.002339 | 0.010152 | 438516BK1 | 22168 | 20161024 | 20191024 | 20020102 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1001488 | 93013 | 2017/12/22 | 53204 | 0.001886 | 0.010455 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001489 | 93013 | 2017/12/26 | 53204 | -0.001345 | 0.010445 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001490 | 93013 | 2017/12/27 | 53204 | 0.006642 | 0.010446 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001491 | 93013 | 2017/12/28 | 53204 | -0.007936 | 0.010463 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
1001492 | 93013 | 2017/12/29 | 53204 | 0.002696 | 0.010431 | 03524BAD8 | 53204 | 20140122 | 20170117 | 20090916 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
716846 rows × 27 columns
dsf_cusip_country_.to_csv('../data/export/part1b_180days_rolling_window.csv')
💲3. PRCCQ from Compustat
ref: https://bbs.pinggu.org/thread-3588010-1-1.html
关于market leverage和book leverage的研究可以追溯到Fama和French在1992年的论文The Cross-Section of Expected Stock Returns(没错就是讲三因子模型的那篇)。
论文里关于两种leverage的定义原话是We use two leverage variables, the ratio of book assets to market equity, A/ME, and the ratio of book assets to book equity, A/BE. We interpret A/ME as a measure of market leverage, while A/BE is a measure of book leverage.
A指的是Total Assets,ME指的是Market Value of Equity,BE指的是Book Value of Equity。Market leverage和book leverage是对公司财务杠杆的两种衡量方式,我们平时一般说的financial leverage指的是book leverage。
顺带一提,论文中的结论是股票收益和market leverage成正相关,和book leverage不相关甚至轻微负相关
ref: https://www.indeed.com/career-advice/career-development/how-to-calculate-total-debt
Total debt is calculated by adding up a company’s liabilities, or debts, which are categorized as short and long-term debt. Financial lenders or business leaders may look at a company’s balance sheet to factor in the debt ratio to make informed decisions about future loan options. They calculate the debt ratio by taking the total debt and dividing it by the total assets.
Outstanding Debt
Debt that has not yet been repaid in full. For example, if one borrows $10,000 and has paid back $2,000, the outstanding debt is $8,000. In general, interest is calculated over the outstanding debt rather than the original amount borrowed.
ref: https://www.lawinsider.com/dictionary/book-leverage-ratio
Book Leverage Ratio means the ratio of Total Consolidated Long Term Debt to Total Assets, as shown in the applicable Financial Statements for Guarantor A for any accounting period and determined in accordance with GAAP.
ref: https://www.quora.com/What-are-the-current-liabilities-in-a-balance-sheet
Liability: Liability is something which the business owes. For example, a loan taken from a bank.
Classification of Liabilities-
Current liabilities: Current liabilities are the liabilities which the business has to pay within a year. These are short-term liabilities. For example, trade creditors. Trade Creditors are the suppliers from whom we purchase the goods on credit. Usually, the payment to trade creditors is made within one year. Other examples of current liabilities are outstanding(unpaid) salaries, tax payable, etc.
Non-Current Liabilities: Non-Current Liabilities are long term liabilities which are not repayable within one year. For example, Long term loan taken for say five years.
Contingent Liability: Contingent Liability is that kind of a liability which is non-existent as on date, but it may become an actual liability in the future. For example, if a customer has filed a suit against the company for some compensation. This can become an actual liability in the future if the firm loses the case. However, as on date, it is not a liability as the outcome is not known today.
关于CRSP / Compustat链接表的详细信息:
http://kaichen.work/?p=138
gvkey_crsp_link = '../data/gvkey_crsp_link.csv'
gvkey = pd.read_csv(gvkey_crsp_link)
# gvkey = gvkey.dropna()
# gvkey_ = gvkey['gvkey']
# gvkey_.to_csv('../gvkeys_unique.csv')
gvkey
GVKEY | LINKPRIM | LIID | LINKTYPE | LPERMNO | LPERMCO | LINKDT | LINKENDDT | conm | tic | ... | ipodate | dldte | STKO | FYRC | GSECTOR | GGROUP | GIND | GSUBIND | SPCINDCD | SPCSECCD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | P | 01 | LC | 10145 | 22168 | 19620131 | E | HONEYWELL INTERNATIONAL INC | HON | ... | NaN | NaN | 0 | 12 | 20.0 | 2010.0 | 201050.0 | 20105010.0 | 355 | 925 |
1 | 1318 | P | 01 | LU | 41443 | 20057 | 19650118 | 20071130 | ALLTEL CORP | AT.2 | ... | NaN | 20071119.0 | 0 | 12 | 50.0 | 5010.0 | 501020.0 | 50102010.0 | 715 | 974 |
2 | 1440 | P | 01 | LC | 24109 | 20077 | 19620131 | E | AMERICAN ELECTRIC POWER CO | AEP | ... | NaN | NaN | 0 | 12 | 55.0 | 5510.0 | 551010.0 | 55101010.0 | 705 | 700 |
3 | 1440 | C | 00X | LC | 24109 | 20077 | 19510101 | 19620130 | AMERICAN ELECTRIC POWER CO | AEP | ... | NaN | NaN | 0 | 12 | 55.0 | 5510.0 | 551010.0 | 55101010.0 | 705 | 700 |
4 | 1447 | C | 01 | LU | 59176 | 90 | 19721214 | E | AMERICAN EXPRESS CO | AXP | ... | NaN | NaN | 0 | 12 | 40.0 | 4020.0 | 402020.0 | 40202010.0 | 850 | 800 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
231 | 209382 | P | 91 | LC | 89330 | 37665 | 20020322 | E | VALE SA | VALE | ... | NaN | NaN | 0 | 12 | 15.0 | 1510.0 | 151040.0 | 15104050.0 | 360 | 970 |
232 | 212340 | P | 90 | LU | 85888 | 16024 | 19980327 | 20190131 | SHIRE PLC | SHPG | ... | 19960215.0 | 20190109.0 | 0 | 12 | 35.0 | 3520.0 | 352010.0 | 35201010.0 | 285 | 905 |
233 | 241216 | P | 90 | LC | 88822 | 41009 | 20001113 | E | SYNGENTA AG | SYT | ... | 20001113.0 | 20180110.0 | 0 | 12 | 15.0 | 1510.0 | 151010.0 | 15101030.0 | 167 | 970 |
234 | 241637 | P | 90 | LC | 93013 | 53204 | 20090916 | E | ANHEUSER-BUSCH INBEV | BUD | ... | NaN | NaN | 0 | 12 | 30.0 | 3020.0 | 302010.0 | 30201010.0 | 140 | 978 |
235 | 252940 | P | 90 | LC | 75811 | 22107 | 20010402 | E | MITSUBISHI UFJ FINANCIAL GRP | MUFG | ... | 20010401.0 | NaN | 0 | 3 | 40.0 | 4010.0 | 401010.0 | 40101010.0 | 810 | 800 |
236 rows × 47 columns
# fundq = '../data/7cf711fc5ea7973d.csv'
fundq = '../data/309e6ea3f87e4430.csv'
# open fundq as dataframe
df = pd.read_csv(fundq)
df
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | datafqtr | atq | cshoq | dlttq | lctq | costat | prccq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | 2015Q1 | 45357.000 | 781.707 | 5661.000 | 15432.0 | A | 104.31 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | 2015Q2 | 46412.000 | 781.762 | 5562.000 | 15574.0 | A | 101.97 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | 2015Q3 | 46625.000 | 770.691 | 5599.000 | 16367.0 | A | 94.69 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | 2015Q4 | 49316.000 | 770.400 | 5554.000 | 18371.0 | A | 103.57 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | 2016Q1 | 50365.000 | 762.115 | 9700.000 | 15659.0 | A | 112.05 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1851 | 252940 | 2016/12/31 | 2016 | 3 | INDL | C | D | STD | USD | 2016Q4 | 2016Q3 | 2586542.894 | 13429.631 | 219398.261 | NaN | A | 6.16 |
1852 | 252940 | 2017/03/31 | 2016 | 4 | INDL | C | D | STD | USD | 2017Q1 | 2016Q4 | 2722353.780 | 13429.944 | 241134.298 | NaN | A | 6.34 |
1853 | 252940 | 2017/06/30 | 2017 | 1 | INDL | C | D | STD | USD | 2017Q2 | 2017Q1 | NaN | NaN | NaN | NaN | A | 6.75 |
1854 | 252940 | 2017/09/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q3 | 2017Q2 | 2666912.107 | 13289.784 | 267516.335 | NaN | A | 6.44 |
1855 | 252940 | 2017/12/31 | 2017 | 3 | INDL | C | D | STD | USD | 2017Q4 | 2017Q3 | NaN | NaN | NaN | NaN | A | 7.27 |
1856 rows × 17 columns
# compute total outstanding debt
df['total_debt'] = df['dlttq'] + df['lctq']
# compute book leverage
df['book_leverage'] = df['total_debt'] / df['atq']
df
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | datafqtr | atq | cshoq | dlttq | lctq | costat | prccq | total_debt | book_leverage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | 2015Q1 | 45357.000 | 781.707 | 5661.000 | 15432.0 | A | 104.31 | 21093.0 | 0.465044 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | 2015Q2 | 46412.000 | 781.762 | 5562.000 | 15574.0 | A | 101.97 | 21136.0 | 0.455399 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | 2015Q3 | 46625.000 | 770.691 | 5599.000 | 16367.0 | A | 94.69 | 21966.0 | 0.471121 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | 2015Q4 | 49316.000 | 770.400 | 5554.000 | 18371.0 | A | 103.57 | 23925.0 | 0.485137 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | 2016Q1 | 50365.000 | 762.115 | 9700.000 | 15659.0 | A | 112.05 | 25359.0 | 0.503504 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1851 | 252940 | 2016/12/31 | 2016 | 3 | INDL | C | D | STD | USD | 2016Q4 | 2016Q3 | 2586542.894 | 13429.631 | 219398.261 | NaN | A | 6.16 | NaN | NaN |
1852 | 252940 | 2017/03/31 | 2016 | 4 | INDL | C | D | STD | USD | 2017Q1 | 2016Q4 | 2722353.780 | 13429.944 | 241134.298 | NaN | A | 6.34 | NaN | NaN |
1853 | 252940 | 2017/06/30 | 2017 | 1 | INDL | C | D | STD | USD | 2017Q2 | 2017Q1 | NaN | NaN | NaN | NaN | A | 6.75 | NaN | NaN |
1854 | 252940 | 2017/09/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q3 | 2017Q2 | 2666912.107 | 13289.784 | 267516.335 | NaN | A | 6.44 | NaN | NaN |
1855 | 252940 | 2017/12/31 | 2017 | 3 | INDL | C | D | STD | USD | 2017Q4 | 2017Q3 | NaN | NaN | NaN | NaN | A | 7.27 | NaN | NaN |
1856 rows × 19 columns
# compute market capitalization
df['market_capitalization'] = (df['prccq'] * df['cshoq'])
# compute market leverage
df['market_leverage'] = df['total_debt'] / df['market_capitalization']
df
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | atq | cshoq | dlttq | lctq | costat | prccq | total_debt | book_leverage | market_capitalization | market_leverage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | 45357.000 | 781.707 | 5661.000 | 15432.0 | A | 104.31 | 21093.0 | 0.465044 | 81539.85717 | 0.258683 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | 46412.000 | 781.762 | 5562.000 | 15574.0 | A | 101.97 | 21136.0 | 0.455399 | 79716.27114 | 0.265140 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | 46625.000 | 770.691 | 5599.000 | 16367.0 | A | 94.69 | 21966.0 | 0.471121 | 72976.73079 | 0.301000 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | 49316.000 | 770.400 | 5554.000 | 18371.0 | A | 103.57 | 23925.0 | 0.485137 | 79790.32800 | 0.299848 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | 50365.000 | 762.115 | 9700.000 | 15659.0 | A | 112.05 | 25359.0 | 0.503504 | 85394.98575 | 0.296961 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1851 | 252940 | 2016/12/31 | 2016 | 3 | INDL | C | D | STD | USD | 2016Q4 | ... | 2586542.894 | 13429.631 | 219398.261 | NaN | A | 6.16 | NaN | NaN | 82726.52696 | NaN |
1852 | 252940 | 2017/03/31 | 2016 | 4 | INDL | C | D | STD | USD | 2017Q1 | ... | 2722353.780 | 13429.944 | 241134.298 | NaN | A | 6.34 | NaN | NaN | 85145.84496 | NaN |
1853 | 252940 | 2017/06/30 | 2017 | 1 | INDL | C | D | STD | USD | 2017Q2 | ... | NaN | NaN | NaN | NaN | A | 6.75 | NaN | NaN | NaN | NaN |
1854 | 252940 | 2017/09/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q3 | ... | 2666912.107 | 13289.784 | 267516.335 | NaN | A | 6.44 | NaN | NaN | 85586.20896 | NaN |
1855 | 252940 | 2017/12/31 | 2017 | 3 | INDL | C | D | STD | USD | 2017Q4 | ... | NaN | NaN | NaN | NaN | A | 7.27 | NaN | NaN | NaN | NaN |
1856 rows × 21 columns
leverage = pd.merge(df, gvkey, left_on='gvkey', right_on='GVKEY')
leverage
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | ipodate | dldte | STKO | FYRC | GSECTOR | GGROUP | GIND | GSUBIND | SPCINDCD | SPCSECCD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | NaN | NaN | 0 | 12 | 20.0 | 2010.0 | 201050.0 | 20105010.0 | 355 | 925 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | NaN | NaN | 0 | 12 | 20.0 | 2010.0 | 201050.0 | 20105010.0 | 355 | 925 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | NaN | NaN | 0 | 12 | 20.0 | 2010.0 | 201050.0 | 20105010.0 | 355 | 925 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | NaN | NaN | 0 | 12 | 20.0 | 2010.0 | 201050.0 | 20105010.0 | 355 | 925 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | NaN | NaN | 0 | 12 | 20.0 | 2010.0 | 201050.0 | 20105010.0 | 355 | 925 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2383 | 252940 | 2016/12/31 | 2016 | 3 | INDL | C | D | STD | USD | 2016Q4 | ... | 20010401.0 | NaN | 0 | 3 | 40.0 | 4010.0 | 401010.0 | 40101010.0 | 810 | 800 |
2384 | 252940 | 2017/03/31 | 2016 | 4 | INDL | C | D | STD | USD | 2017Q1 | ... | 20010401.0 | NaN | 0 | 3 | 40.0 | 4010.0 | 401010.0 | 40101010.0 | 810 | 800 |
2385 | 252940 | 2017/06/30 | 2017 | 1 | INDL | C | D | STD | USD | 2017Q2 | ... | 20010401.0 | NaN | 0 | 3 | 40.0 | 4010.0 | 401010.0 | 40101010.0 | 810 | 800 |
2386 | 252940 | 2017/09/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q3 | ... | 20010401.0 | NaN | 0 | 3 | 40.0 | 4010.0 | 401010.0 | 40101010.0 | 810 | 800 |
2387 | 252940 | 2017/12/31 | 2017 | 3 | INDL | C | D | STD | USD | 2017Q4 | ... | 20010401.0 | NaN | 0 | 3 | 40.0 | 4010.0 | 401010.0 | 40101010.0 | 810 | 800 |
2388 rows × 68 columns
links_data
CUSIP | PERMNO | PERMCO | trace_startdt | trace_enddt | crsp_startdt | crsp_enddt | link_startdt | link_enddt | |
---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 84766 | 31989 | 20020711 | 20110318 | 20020102 | 20080424 | 20020711 | 20080424 |
1 | 00077TAA2 | 92355 | 28711 | 20120614 | 20150324 | 20071018 | 20200722 | 20120614 | 20150324 |
2 | 00077TAB0 | 84766 | 31989 | 20020906 | 20110328 | 20020102 | 20080424 | 20020906 | 20080424 |
3 | 00077TAB0 | 92355 | 28711 | 20120607 | 20150217 | 20071018 | 20200722 | 20120607 | 20150217 |
4 | 001192AA1 | 15553 | 116 | 20021206 | 20101216 | 20020102 | 20090208 | 20021206 | 20090208 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1124 | 961548AS3 | 21186 | 21913 | 20020726 | 20141231 | 20020130 | 20150701 | 20020726 | 20141231 |
1125 | 961548AS3 | 21186 | 21913 | 20151112 | 20170524 | 20150702 | 20200930 | 20151112 | 20170524 |
1126 | 984121CK7 | 27983 | 21945 | 20150226 | 20200828 | 20020102 | 20200930 | 20150226 | 20200828 |
1127 | 98934KAB6 | 79363 | 29908 | 20020702 | 20200923 | 20020102 | 20200930 | 20020702 | 20200923 |
1128 | 989701AU1 | 84129 | 5057 | 20090731 | 20140414 | 19821101 | 20200930 | 20090731 | 20140414 |
1129 rows × 9 columns
leverge_ = pd.merge(leverage, links_data, left_on='LPERMNO', right_on='PERMNO')
leverge_
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | SPCSECCD | CUSIP | PERMNO | PERMCO | trace_startdt | trace_enddt | crsp_startdt | crsp_enddt | link_startdt | link_enddt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15093 | 252940 | 2016/12/31 | 2016 | 3 | INDL | C | D | STD | USD | 2016Q4 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15094 | 252940 | 2017/03/31 | 2016 | 4 | INDL | C | D | STD | USD | 2017Q1 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15095 | 252940 | 2017/06/30 | 2017 | 1 | INDL | C | D | STD | USD | 2017Q2 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15096 | 252940 | 2017/09/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q3 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15097 | 252940 | 2017/12/31 | 2017 | 3 | INDL | C | D | STD | USD | 2017Q4 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15098 rows × 77 columns
leverge_unique = leverge_.drop_duplicates(subset=['CUSIP', 'datadate'], keep='last')
leverge_unique
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | SPCSECCD | CUSIP | PERMNO | PERMCO | trace_startdt | trace_enddt | crsp_startdt | crsp_enddt | link_startdt | link_enddt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | 925 | 438516BK1 | 10145 | 22168 | 20161024 | 20191024 | 20020102 | 20200930 | 20161024 | 20191024 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15093 | 252940 | 2016/12/31 | 2016 | 3 | INDL | C | D | STD | USD | 2016Q4 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15094 | 252940 | 2017/03/31 | 2016 | 4 | INDL | C | D | STD | USD | 2017Q1 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15095 | 252940 | 2017/06/30 | 2017 | 1 | INDL | C | D | STD | USD | 2017Q2 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15096 | 252940 | 2017/09/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q3 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
15097 | 252940 | 2017/12/31 | 2017 | 3 | INDL | C | D | STD | USD | 2017Q4 | ... | 800 | 904851AA0 | 75811 | 22107 | 20120816 | 20131203 | 20051003 | 20180401 | 20120816 | 20131203 |
11817 rows × 77 columns
# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file)
cusips
cusip_id | offering_amt | offering_date | maturity | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.7500 | N | N | N | N | 5.0 |
1 | 00077TAB0 | 150000.0 | 1993-10-13 | 2093-10-15 | CDEB | SUB | N | USA | 7.1250 | N | N | N | N | 6.0 |
2 | 001192AA1 | 300000.0 | 2001-02-23 | 2011-01-14 | CDEB | SEN | N | USA | 7.1250 | N | N | N | N | 117277.0 |
3 | 00163XAM2 | 175000.0 | 2006-08-10 | 2013-08-15 | CMTN | SEN | N | USA | 5.9000 | N | N | N | N | 351976.0 |
4 | 00206RBS0 | 1250000.0 | 2013-02-07 | 2016-02-12 | CDEB | SEN | N | USA | 0.7411 | N | N | N | N | 589049.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1055 | 961548AQ7 | 150000.0 | 1997-03-12 | 2027-03-15 | CDEB | SEN | N | USA | 7.6500 | N | N | N | N | 51090.0 |
1056 | 961548AS3 | 150000.0 | 1997-06-19 | 2027-06-15 | CDEB | SEN | N | USA | 7.5000 | N | N | N | N | 56219.0 |
1057 | 984121CK7 | 400000.0 | 2015-02-26 | 2020-09-01 | CDEB | SEN | N | USA | 2.7500 | N | N | N | N | 626119.0 |
1058 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.0000 | N | N | N | N | 24061.0 |
1059 | 989701AU1 | 34680.0 | 2009-06-01 | 2014-05-15 | CDEB | SUB | N | USA | 5.6500 | N | N | N | N | 498722.0 |
1060 rows × 14 columns
leverge_unique_country = pd.merge(leverge_unique, cusips, left_on='CUSIP', right_on='cusip_id')
leverge_unique_country
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11812 | 252940 | 2016/12/31 | 2016 | 3 | INDL | C | D | STD | USD | 2016Q4 | ... | CDEB | SUB | N | USA | 5.25000 | N | N | N | N | 176262.0 |
11813 | 252940 | 2017/03/31 | 2016 | 4 | INDL | C | D | STD | USD | 2017Q1 | ... | CDEB | SUB | N | USA | 5.25000 | N | N | N | N | 176262.0 |
11814 | 252940 | 2017/06/30 | 2017 | 1 | INDL | C | D | STD | USD | 2017Q2 | ... | CDEB | SUB | N | USA | 5.25000 | N | N | N | N | 176262.0 |
11815 | 252940 | 2017/09/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q3 | ... | CDEB | SUB | N | USA | 5.25000 | N | N | N | N | 176262.0 |
11816 | 252940 | 2017/12/31 | 2017 | 3 | INDL | C | D | STD | USD | 2017Q4 | ... | CDEB | SUB | N | USA | 5.25000 | N | N | N | N | 176262.0 |
11817 rows × 91 columns
# leverge_unique_country = leverge_unique_country.dropna()
leverge_unique_country = leverge_unique_country[leverge_unique_country['market_leverage'].notna()]
leverge_unique_country
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11796 | 241637 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11798 | 241637 | 2016/06/30 | 2016 | 2 | INDL | C | D | STD | USD | 2016Q2 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11800 | 241637 | 2016/12/31 | 2016 | 4 | INDL | C | D | STD | USD | 2016Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11802 | 241637 | 2017/06/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q2 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11804 | 241637 | 2017/12/31 | 2017 | 4 | INDL | C | D | STD | USD | 2017Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
3205 rows × 91 columns
leverge_unique_country = leverge_unique_country[leverge_unique_country['book_leverage'].notna()]
leverge_unique_country
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11796 | 241637 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11798 | 241637 | 2016/06/30 | 2016 | 2 | INDL | C | D | STD | USD | 2016Q2 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11800 | 241637 | 2016/12/31 | 2016 | 4 | INDL | C | D | STD | USD | 2016Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11802 | 241637 | 2017/06/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q2 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11804 | 241637 | 2017/12/31 | 2017 | 4 | INDL | C | D | STD | USD | 2017Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
3205 rows × 91 columns
leverge_unique_country
gvkey | datadate | fyearq | fqtr | indfmt | consol | popsrc | datafmt | curcdq | datacqtr | ... | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1300 | 2015/03/31 | 2015 | 1 | INDL | C | D | STD | USD | 2015Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
1 | 1300 | 2015/06/30 | 2015 | 2 | INDL | C | D | STD | USD | 2015Q2 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
2 | 1300 | 2015/09/30 | 2015 | 3 | INDL | C | D | STD | USD | 2015Q3 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
3 | 1300 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
4 | 1300 | 2016/03/31 | 2016 | 1 | INDL | C | D | STD | USD | 2016Q1 | ... | CDEB | SEN | N | USA | 2.54658 | N | N | N | N | 663731.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11796 | 241637 | 2015/12/31 | 2015 | 4 | INDL | C | D | STD | USD | 2015Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11798 | 241637 | 2016/06/30 | 2016 | 2 | INDL | C | D | STD | USD | 2016Q2 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11800 | 241637 | 2016/12/31 | 2016 | 4 | INDL | C | D | STD | USD | 2016Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11802 | 241637 | 2017/06/30 | 2017 | 2 | INDL | C | D | STD | USD | 2017Q2 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
11804 | 241637 | 2017/12/31 | 2017 | 4 | INDL | C | D | STD | USD | 2017Q4 | ... | CDEB | SEN | N | USA | 1.07567 | N | N | N | N | 605853.0 |
3205 rows × 91 columns
leverge_unique_country.to_csv('../data/export/part1c_leverge.csv')
💲4. Locate the latest credit rating
# monthly yields data got in Part1(a)
yields_median_usa_gbr
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | cusip_id | offering_amt | offering_date | maturity | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
1 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
2 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
3 | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
4 | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11173 | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11174 | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11175 | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11176 | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11177 rows × 19 columns
# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file, usecols=[0, 13])
cusips
cusip_id | issue_id | |
---|---|---|
0 | 00077TAA2 | 5.0 |
1 | 00077TAB0 | 6.0 |
2 | 001192AA1 | 117277.0 |
3 | 00163XAM2 | 351976.0 |
4 | 00206RBS0 | 589049.0 |
... | ... | ... |
1055 | 961548AQ7 | 51090.0 |
1056 | 961548AS3 | 56219.0 |
1057 | 984121CK7 | 626119.0 |
1058 | 98934KAB6 | 24061.0 |
1059 | 989701AU1 | 498722.0 |
1060 rows × 2 columns
cusips = cusips.sort_values(by=['issue_id'])
# remove cusip 40056E6B9 beacuse it lose issus_id data
cusips.dropna(inplace=True)
cusips.issue_id.astype(np.int64)
cusips
cusip_id | issue_id | |
---|---|---|
0 | 00077TAA2 | 5.0 |
1 | 00077TAB0 | 6.0 |
40 | 030955AB4 | 1057.0 |
128 | 126168AB9 | 3024.0 |
391 | 386088AB4 | 11624.0 |
... | ... | ... |
811 | 61769HDQ5 | 830907.0 |
966 | 80283LAX1 | 832065.0 |
614 | 40056FJX4 | 832139.0 |
615 | 40056FKS3 | 834537.0 |
846 | 674599CT0 | 843393.0 |
1059 rows × 2 columns
# read csv as pandas dataframe
# acf351b_ratings.csv
rating_file = '../data/acf351b_ratings.csv'
ratings = pd.read_csv(rating_file)
ratings
issue_id | rating_type | rating_date | rating | |
---|---|---|---|---|
0 | 5 | DPR | 5/20/1993 | NR |
1 | 5 | MR | 5/24/2017 | NR |
2 | 5 | FR | 12/14/2018 | A- |
3 | 5 | SPR | 5/16/2019 | BB+ |
4 | 6 | DPR | 10/13/1993 | NR |
... | ... | ... | ... | ... |
3231 | 834537 | FR | 6/25/2019 | NR |
3232 | 834537 | MR | 8/1/2019 | A3 |
3233 | 843393 | MR | 8/1/2019 | Baa3 |
3234 | 843393 | FR | 8/6/2019 | NR |
3235 | 843393 | SPR | 8/8/2019 | BBB |
3236 rows × 4 columns
# change dtype of rating_date from string to date
ratings['rating_date'] = pd.to_datetime(ratings['rating_date'])
ratings
issue_id | rating_type | rating_date | rating | |
---|---|---|---|---|
0 | 5 | DPR | 1993-05-20 | NR |
1 | 5 | MR | 2017-05-24 | NR |
2 | 5 | FR | 2018-12-14 | A- |
3 | 5 | SPR | 2019-05-16 | BB+ |
4 | 6 | DPR | 1993-10-13 | NR |
... | ... | ... | ... | ... |
3231 | 834537 | FR | 2019-06-25 | NR |
3232 | 834537 | MR | 2019-08-01 | A3 |
3233 | 843393 | MR | 2019-08-01 | Baa3 |
3234 | 843393 | FR | 2019-08-06 | NR |
3235 | 843393 | SPR | 2019-08-08 | BBB |
3236 rows × 4 columns
# merge two dataframe by outer join
ratings_of_cusips = pd.merge(cusips, ratings, how='outer', on ='issue_id')
ratings_of_cusips
cusip_id | issue_id | rating_type | rating_date | rating | |
---|---|---|---|---|---|
0 | 00077TAA2 | 5.0 | DPR | 1993-05-20 | NR |
1 | 00077TAA2 | 5.0 | MR | 2017-05-24 | NR |
2 | 00077TAA2 | 5.0 | FR | 2018-12-14 | A- |
3 | 00077TAA2 | 5.0 | SPR | 2019-05-16 | BB+ |
4 | 00077TAB0 | 6.0 | DPR | 1993-10-13 | NR |
... | ... | ... | ... | ... | ... |
3231 | 674599CT0 | 843393.0 | FR | 2019-08-06 | NR |
3232 | 674599CT0 | 843393.0 | SPR | 2019-08-08 | BBB |
3233 | NaN | 785195.0 | SPR | 2018-10-26 | NR |
3234 | NaN | 785195.0 | FR | 2018-10-26 | NR |
3235 | NaN | 785195.0 | MR | 2019-05-16 | A3 |
3236 rows × 5 columns
# test
# get the cusio_id of each row
for row in yields_median_usa_gbr.iterrows():
print(row[1]['CUSIP_ID'])
break
00077TAA2
# test
# locate the ratings of yield row
for row in yields_median_usa_gbr.iterrows():
ratings_of_row = ratings_of_cusips.loc[ratings_of_cusips['cusip_id'] == row[1]['CUSIP_ID']]
date_of_row = row[1]['trd_exctn_dt']
break
print(ratings_of_row)
cusip_id issue_id rating_type rating_date rating
0 00077TAA2 5.0 DPR 1993-05-20 NR
1 00077TAA2 5.0 MR 2017-05-24 NR
2 00077TAA2 5.0 FR 2018-12-14 A-
3 00077TAA2 5.0 SPR 2019-05-16 BB+
# test
# get the latest credit ratings
print(ratings_of_row['rating_date'].sort_values())
print(date_of_row)
ind = ratings_of_row['rating_date'].sort_values().searchsorted(date_of_row)
ind-1
0 1993-05-20
1 2017-05-24
2 2018-12-14
3 2019-05-16
Name: rating_date, dtype: datetime64[ns]
2015-01-26 00:00:00
0
# the monthly yield data generate in part1(a)
yields_median_usa_gbr
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | cusip_id | offering_amt | offering_date | maturity | bond_type | security_level | yankee | country_domicile | coupon | asset_backed | putable | convertible | callable | issue_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
1 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
2 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
3 | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
4 | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 00077TAA2 | 250000.0 | 1993-05-20 | 2023-05-15 | CDEB | SUB | N | USA | 7.75 | N | N | N | N | 5.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11173 | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11174 | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11175 | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11176 | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 98934KAB6 | 300000.0 | 1993-11-09 | 2023-11-15 | CDEB | SEN | N | USA | 7.00 | N | N | N | N | 24061.0 |
11177 rows × 19 columns
from datetime import datetime
# get the latest rating level for each row in monthly yield data
ratings = []
for row in yields_median_usa_gbr.iterrows():
# get the ratings of row from ratings_of_cusips
ratings_of_row = ratings_of_cusips.loc[ratings_of_cusips['cusip_id'] == row[1]['CUSIP_ID']]
# get trd_exctn_dt of the row as "date_of_row"
date_of_row = row[1]['trd_exctn_dt']
# locate the latest rating
ind_to_insert = ratings_of_row['rating_date'].sort_values().searchsorted(date_of_row)
# get the latest rating from "ratings_of_row"
rating_of_row = ratings_of_row.iloc[ind_to_insert - 1]['rating']
# get the rating type of latest rating from "ratings_of_row"
ratetype_of_row = ratings_of_row.iloc[ind_to_insert - 1]['rating_type']
# get the rating date of latest rating from "ratings_of_row"
ratedate_of_row = ratings_of_row.iloc[ind_to_insert - 1]['rating_date']
# get cusip_id of the row
cusip_of_row = row[1]['CUSIP_ID']
# get the date of the row
trd_exctn_dt = row[1]['trd_exctn_dt']
# country of row
country = row[1]['country_domicile']
ratings.append({'cusip' : cusip_of_row , 'trd_exctn_dt': trd_exctn_dt, 'country': country, 'rating type': ratetype_of_row,
'rateing date': ratedate_of_row, 'rating' : rating_of_row})
# check results
# the length of ratings list is equal to the length of monthly yield dataframe
len(ratings)
11177
ratings[0]
{'cusip': '00077TAA2',
'trd_exctn_dt': Timestamp('2015-01-26 00:00:00'),
'country': 'USA',
'rating type': 'DPR',
'rateing date': Timestamp('1993-05-20 00:00:00'),
'rating': 'NR'}
# write results
import csv
with open('../data/export/ratings_with_country.csv', 'w') as f: # You will need 'wb' mode in Python 2.x
for i, d in enumerate(ratings):
w = csv.DictWriter(f, d.keys())
if i == 0:
w.writeheader()
w.writerow(d)
💲5. Compute monthly credit
5.1 Time to maturity
Credit spread
譯自英文-
在金融中,信用利差或淨信用利差是一種期權策略,它涉及購買一種期權和出售另一種期權,它們具有相同的等級和到期日,但執行價格不同。它的目的是賺取利潤時,這兩個選項之間的價差縮小。投資者在進入頭寸時獲得淨信用,並希望價差縮小或到期以獲取利潤。相反,投資者必須付費才能輸入債務價差。
维基百科(英文)
What Is a Maturity Date?
The maturity date is the date on which the principal amount of a note, draft, acceptance bond or other debt instrument becomes due. On this date, which is generally printed on the certificate of the instrument in question, the principal investment is repaid to the investor, while the interest payments that were regularly paid out during the life of the bond, cease to roll in. The maturity date also refers to the termination date (due date) on which an installment loan must be paid back in full.
ref: https://www.investopedia.com/terms/m/maturitydate.asp
# read csv as pandas dataframe
# acf351b_python_data.csv
cusips_file = '../data/acf351b_python_data.csv'
cusips = pd.read_csv(cusips_file, usecols=[0, 3, 7])
cusips
cusip_id | maturity | country_domicile | |
---|---|---|---|
0 | 00077TAA2 | 2023-05-15 | USA |
1 | 00077TAB0 | 2093-10-15 | USA |
2 | 001192AA1 | 2011-01-14 | USA |
3 | 00163XAM2 | 2013-08-15 | USA |
4 | 00206RBS0 | 2016-02-12 | USA |
... | ... | ... | ... |
1055 | 961548AQ7 | 2027-03-15 | USA |
1056 | 961548AS3 | 2027-06-15 | USA |
1057 | 984121CK7 | 2020-09-01 | USA |
1058 | 98934KAB6 | 2023-11-15 | USA |
1059 | 989701AU1 | 2014-05-15 | USA |
1060 rows × 3 columns
# the bond transaction data from Part1(a)
yields_median
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | cusip_id | |
---|---|---|---|---|---|---|
1198533 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 00077TAA2 |
1198535 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 00077TAA2 |
111592 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 00077TAA2 |
111594 | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 00077TAA2 |
111597 | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 00077TAA2 |
... | ... | ... | ... | ... | ... | ... |
56809 | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 98934KAB6 |
56840 | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 98934KAB6 |
56855 | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 98934KAB6 |
56862 | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 98934KAB6 |
56881 | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 98934KAB6 |
11177 rows × 6 columns
yields_median['cusip_id'] = yields_median['CUSIP_ID']
yields_median
<ipython-input-278-c026565aa96d>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
yields_median['cusip_id'] = yields_median['CUSIP_ID']
CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | cusip_id | |
---|---|---|---|---|---|---|
1198533 | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 00077TAA2 |
1198535 | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 00077TAA2 |
111592 | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 00077TAA2 |
111594 | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 00077TAA2 |
111597 | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 00077TAA2 |
... | ... | ... | ... | ... | ... | ... |
56809 | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 98934KAB6 |
56840 | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 98934KAB6 |
56855 | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 98934KAB6 |
56862 | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 98934KAB6 |
56881 | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 98934KAB6 |
11177 rows × 6 columns
# merge two dataframe by inner join
transactions_maturity = pd.merge(cusips, yields_median, how='inner', on ='cusip_id')
transactions_maturity
cusip_id | maturity | country_domicile | CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | |
---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 |
1 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 |
2 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 |
3 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 |
4 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 |
11173 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 |
11174 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 |
11175 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 |
11176 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 |
11177 rows × 8 columns
transactions_maturity.dtypes
cusip_id object
maturity object
country_domicile object
CUSIP_ID object
trd_exctn_dt datetime64[ns]
trd_exctn_tm object
yld_pt float64
year_month period[M]
dtype: object
# transform maturity to datetime
transactions_maturity['maturity'] = pd.to_datetime(transactions_maturity['maturity'])
transactions_maturity.dtypes
cusip_id object
maturity datetime64[ns]
country_domicile object
CUSIP_ID object
trd_exctn_dt datetime64[ns]
trd_exctn_tm object
yld_pt float64
year_month period[M]
dtype: object
# get 'time to maturity' of dt.days type
transactions_maturity['time_to_maturity'] = transactions_maturity['maturity'] - transactions_maturity['trd_exctn_dt']
# transform to int type
transactions_maturity['time_to_maturity'] = transactions_maturity['time_to_maturity'].dt.days
# transform to year
transactions_maturity['time_to_maturity'] = transactions_maturity['time_to_maturity'] / 365.25
# round to 2 decimals
transactions_maturity['time_to_maturity'] = np.around(transactions_maturity['time_to_maturity'], 2)
transactions_maturity
cusip_id | maturity | country_domicile | CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | time_to_maturity | |
---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 8.30 |
1 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 8.14 |
2 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 8.04 |
3 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 7.80 |
4 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 7.73 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 6.21 |
11173 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 6.14 |
11174 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 6.06 |
11175 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 5.98 |
11176 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 5.91 |
11177 rows × 9 columns
5.2 Zero-coupon yield
百度百科:零息债券收益率(Zero-coupon yield)
transactions_maturity
cusip_id | maturity | country_domicile | CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | time_to_maturity | |
---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 8.30 |
1 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 8.14 |
2 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 8.04 |
3 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 7.80 |
4 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 7.73 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 6.21 |
11173 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 6.14 |
11174 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 6.06 |
11175 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 5.98 |
11176 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 5.91 |
11177 rows × 9 columns
# read zero coupon file
zero_coupon_file = '../data/feds200628.csv'
zero_coupon = pd.read_csv(zero_coupon_file, skiprows=9, usecols=list(range(31)))
# get the columns of the dataframe
zero_coupon.columns
Index(['Unnamed: 0', 'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05',
'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11',
'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17',
'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23',
'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29',
'SVENY30'],
dtype='object')
# rename the unnamed date column
zero_coupon = zero_coupon.rename(columns={'Unnamed: 0': 'date'})
zero_coupon.dtypes
date object
SVENY01 float64
SVENY02 float64
SVENY03 float64
SVENY04 float64
SVENY05 float64
SVENY06 float64
SVENY07 float64
SVENY08 float64
SVENY09 float64
SVENY10 float64
SVENY11 float64
SVENY12 float64
SVENY13 float64
SVENY14 float64
SVENY15 float64
SVENY16 float64
SVENY17 float64
SVENY18 float64
SVENY19 float64
SVENY20 float64
SVENY21 float64
SVENY22 float64
SVENY23 float64
SVENY24 float64
SVENY25 float64
SVENY26 float64
SVENY27 float64
SVENY28 float64
SVENY29 float64
SVENY30 float64
dtype: object
zero_coupon['date'] = pd.to_datetime(zero_coupon['date'])
zero_coupon.dtypes
date datetime64[ns]
SVENY01 float64
SVENY02 float64
SVENY03 float64
SVENY04 float64
SVENY05 float64
SVENY06 float64
SVENY07 float64
SVENY08 float64
SVENY09 float64
SVENY10 float64
SVENY11 float64
SVENY12 float64
SVENY13 float64
SVENY14 float64
SVENY15 float64
SVENY16 float64
SVENY17 float64
SVENY18 float64
SVENY19 float64
SVENY20 float64
SVENY21 float64
SVENY22 float64
SVENY23 float64
SVENY24 float64
SVENY25 float64
SVENY26 float64
SVENY27 float64
SVENY28 float64
SVENY29 float64
SVENY30 float64
dtype: object
transactions_maturity.dtypes
cusip_id object
maturity datetime64[ns]
country_domicile object
CUSIP_ID object
trd_exctn_dt datetime64[ns]
trd_exctn_tm object
yld_pt float64
year_month period[M]
time_to_maturity float64
dtype: object
# transactions_maturity['zero_coupon'] =
# loc = zero_coupon.loc[zero_coupon['date'] == transactions_maturity['trd_exctn_dt']][round(transactions_maturity['time_to_maturity'])]
# loc = zero_coupon.loc[zero_coupon['date'] == transactions_maturity['trd_exctn_dt']]
# zero_coupon['date']
# transactions_maturity['trd_exctn_dt']
# merge two dataframe by left join
transactions_zerocoupon = pd.merge(transactions_maturity, zero_coupon, how='left', right_on ='date', left_on='trd_exctn_dt')
# sort rows
transactions_zerocoupon.sort_values(by=['cusip_id', 'year_month'])
cusip_id | maturity | country_domicile | CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | time_to_maturity | date | ... | SVENY21 | SVENY22 | SVENY23 | SVENY24 | SVENY25 | SVENY26 | SVENY27 | SVENY28 | SVENY29 | SVENY30 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 8.30 | 2015-01-26 | ... | 2.3417 | 2.3649 | 2.3866 | 2.4067 | 2.4255 | 2.4430 | 2.4594 | 2.4748 | 2.4893 | 2.5028 |
1 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 8.14 | 2015-03-24 | ... | 2.4235 | 2.4491 | 2.4731 | 2.4957 | 2.5168 | 2.5367 | 2.5553 | 2.5729 | 2.5895 | 2.6052 |
2 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 8.04 | 2015-05-01 | ... | 2.7832 | 2.8154 | 2.8459 | 2.8746 | 2.9018 | 2.9276 | 2.9520 | 2.9751 | 2.9970 | 3.0179 |
3 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 7.80 | 2015-07-27 | ... | 2.8422 | 2.8854 | 2.9288 | 2.9722 | 3.0157 | 3.0591 | 3.1025 | 3.1456 | 3.1886 | 3.2312 |
4 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 7.73 | 2015-08-20 | ... | 2.6613 | 2.6985 | 2.7355 | 2.7723 | 2.8089 | 2.8453 | 2.8815 | 2.9173 | 2.9529 | 2.9880 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 6.21 | 2017-08-30 | ... | 2.6519 | 2.6866 | 2.7208 | 2.7544 | 2.7875 | 2.8199 | 2.8517 | 2.8829 | 2.9135 | 2.9434 |
11173 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 6.14 | 2017-09-26 | ... | 2.6849 | 2.7176 | 2.7500 | 2.7821 | 2.8139 | 2.8452 | 2.8762 | 2.9066 | 2.9366 | 2.9661 |
11174 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 6.06 | 2017-10-24 | ... | 2.8288 | 2.8605 | 2.8922 | 2.9239 | 2.9557 | 2.9874 | 3.0191 | 3.0506 | 3.0819 | 3.1129 |
11175 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 5.98 | 2017-11-22 | ... | 2.6742 | 2.6985 | 2.7227 | 2.7467 | 2.7706 | 2.7944 | 2.8180 | 2.8414 | 2.8646 | 2.8877 |
11176 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 5.91 | 2017-12-19 | ... | 2.7453 | 2.7672 | 2.7896 | 2.8125 | 2.8357 | 2.8593 | 2.8833 | 2.9075 | 2.9320 | 2.9566 |
11177 rows × 40 columns
# test format
year = f'SVENY{round(8.30):02d}'
year
'SVENY08'
transactions_zerocoupon['zero_coupon'] = transactions_zerocoupon[f"SVENY{round(transactions_zerocoupon['time_to_maturity']):02d}"]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-298-066c4c5b005f> in <module>
----> 1 transactions_zerocoupon['zero_coupon'] = transactions_zerocoupon[f"SVENY{round(transactions_zerocoupon['time_to_maturity']):02d}"]
TypeError: unsupported format string passed to Series.__format__
transactions_zerocoupon.columns
Index(['cusip_id', 'maturity', 'country_domicile', 'CUSIP_ID', 'trd_exctn_dt',
'trd_exctn_tm', 'yld_pt', 'year_month', 'time_to_maturity', 'date',
'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05', 'SVENY06',
'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11', 'SVENY12',
'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17', 'SVENY18',
'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23', 'SVENY24',
'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29', 'SVENY30'],
dtype='object')
transactions_zerocoupon[transactions_zerocoupon.columns[10]]
0 0.2029
1 0.3018
2 0.2832
3 0.3333
4 0.4325
...
11172 1.2249
11173 1.3007
11174 1.4183
11175 1.5927
11176 1.7084
Name: SVENY01, Length: 11177, dtype: float64
transactions_zerocoupon['time_to_maturity'].max()
82.33
# round to the closed year
transactions_zerocoupon['year'] = transactions_zerocoupon['time_to_maturity'].transform(lambda x : 30 if x > 29 else (x + 1 if x == 0 or not x.is_integer() else x))
# some test samples of the above lambda function
'''
x function(x)
0 1
0.3 1.3
0.5 1.5
1 1
1.3 2.3
2 2
5. 5
9.04 10.04
29 29
29.1 30
30 30
31 30
'''
'\n\nx function(x)\n\n0 1\n0.3 1.3\n0.5 1.5\n1 1\n1.3 2.3\n2 2\n5. 5\n9.04 10.04\n29 29\n29.1 30\n30 30\n31 30\n\n'
transactions_zerocoupon['year'].min()
1.0
# transactions_zerocoupon['year'] = transactions_zerocoupon['year'].transform(lambda x: 1 if x < 0 else x)
transactions_zerocoupon
cusip_id | maturity | country_domicile | CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | time_to_maturity | date | ... | SVENY22 | SVENY23 | SVENY24 | SVENY25 | SVENY26 | SVENY27 | SVENY28 | SVENY29 | SVENY30 | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 8.30 | 2015-01-26 | ... | 2.3649 | 2.3866 | 2.4067 | 2.4255 | 2.4430 | 2.4594 | 2.4748 | 2.4893 | 2.5028 | 9.30 |
1 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 8.14 | 2015-03-24 | ... | 2.4491 | 2.4731 | 2.4957 | 2.5168 | 2.5367 | 2.5553 | 2.5729 | 2.5895 | 2.6052 | 9.14 |
2 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 8.04 | 2015-05-01 | ... | 2.8154 | 2.8459 | 2.8746 | 2.9018 | 2.9276 | 2.9520 | 2.9751 | 2.9970 | 3.0179 | 9.04 |
3 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 7.80 | 2015-07-27 | ... | 2.8854 | 2.9288 | 2.9722 | 3.0157 | 3.0591 | 3.1025 | 3.1456 | 3.1886 | 3.2312 | 8.80 |
4 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 7.73 | 2015-08-20 | ... | 2.6985 | 2.7355 | 2.7723 | 2.8089 | 2.8453 | 2.8815 | 2.9173 | 2.9529 | 2.9880 | 8.73 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 6.21 | 2017-08-30 | ... | 2.6866 | 2.7208 | 2.7544 | 2.7875 | 2.8199 | 2.8517 | 2.8829 | 2.9135 | 2.9434 | 7.21 |
11173 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 6.14 | 2017-09-26 | ... | 2.7176 | 2.7500 | 2.7821 | 2.8139 | 2.8452 | 2.8762 | 2.9066 | 2.9366 | 2.9661 | 7.14 |
11174 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 6.06 | 2017-10-24 | ... | 2.8605 | 2.8922 | 2.9239 | 2.9557 | 2.9874 | 3.0191 | 3.0506 | 3.0819 | 3.1129 | 7.06 |
11175 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 5.98 | 2017-11-22 | ... | 2.6985 | 2.7227 | 2.7467 | 2.7706 | 2.7944 | 2.8180 | 2.8414 | 2.8646 | 2.8877 | 6.98 |
11176 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 5.91 | 2017-12-19 | ... | 2.7672 | 2.7896 | 2.8125 | 2.8357 | 2.8593 | 2.8833 | 2.9075 | 2.9320 | 2.9566 | 6.91 |
11177 rows × 41 columns
# 事先创建“zero_coupon”列,来避免“Wrong number of items passed 11177, placement implies 1”错误
transactions_zerocoupon['zero_coupon'] = 0
# For each bond issue and trade, locate the zero-coupon yield of US treasury with the time to maturity closest to the bond issue’s.
transactions_zerocoupon['zero_coupon'] = transactions_zerocoupon[transactions_zerocoupon.columns[transactions_zerocoupon['year'].astype(int) + 8]]
transactions_zerocoupon['year']
0 9.30
1 9.14
2 9.04
3 8.80
4 8.73
...
11172 7.21
11173 7.14
11174 7.06
11175 6.98
11176 6.91
Name: year, Length: 11177, dtype: float64
5.3 Credit spreads
# take the difference between the yields of the bond issue and the zero-coupon yield and the result is the credit spreads.
transactions_zerocoupon['credit_spreads'] = transactions_zerocoupon['yld_pt'] - transactions_zerocoupon['zero_coupon']
transactions_zerocoupon
cusip_id | maturity | country_domicile | CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | time_to_maturity | date | ... | SVENY24 | SVENY25 | SVENY26 | SVENY27 | SVENY28 | SVENY29 | SVENY30 | year | zero_coupon | credit_spreads | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-01-26 | 16:01:58 | NaN | 2015-01 | 8.30 | 2015-01-26 | ... | 2.4067 | 2.4255 | 2.4430 | 2.4594 | 2.4748 | 2.4893 | 2.5028 | 9.30 | 1.7586 | NaN |
1 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-03-24 | 16:37:01 | NaN | 2015-03 | 8.14 | 2015-03-24 | ... | 2.4957 | 2.5168 | 2.5367 | 2.5553 | 2.5729 | 2.5895 | 2.6052 | 9.14 | 1.7968 | NaN |
2 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-05-01 | 11:03:37 | NaN | 2015-05 | 8.04 | 2015-05-01 | ... | 2.8746 | 2.9018 | 2.9276 | 2.9520 | 2.9751 | 2.9970 | 3.0179 | 9.04 | 2.0255 | NaN |
3 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-07-27 | 9:14:28 | NaN | 2015-07 | 7.80 | 2015-07-27 | ... | 2.9722 | 3.0157 | 3.0591 | 3.1025 | 3.1456 | 3.1886 | 3.2312 | 8.80 | 2.1064 | NaN |
4 | 00077TAA2 | 2023-05-15 | USA | 00077TAA2 | 2015-08-20 | 12:06:06 | NaN | 2015-08 | 7.73 | 2015-08-20 | ... | 2.7723 | 2.8089 | 2.8453 | 2.8815 | 2.9173 | 2.9529 | 2.9880 | 8.73 | 1.987 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 6.21 | 2017-08-30 | ... | 2.7544 | 2.7875 | 2.8199 | 2.8517 | 2.8829 | 2.9135 | 2.9434 | 7.21 | 2.0507 | 0.67949 |
11173 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 6.14 | 2017-09-26 | ... | 2.7821 | 2.8139 | 2.8452 | 2.8762 | 2.9066 | 2.9366 | 2.9661 | 7.14 | 2.1521 | 0.70636 |
11174 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 6.06 | 2017-10-24 | ... | 2.9239 | 2.9557 | 2.9874 | 3.0191 | 3.0506 | 3.0819 | 3.1129 | 7.06 | 2.3256 | 0.629487 |
11175 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 5.98 | 2017-11-22 | ... | 2.7467 | 2.7706 | 2.7944 | 2.8180 | 2.8414 | 2.8646 | 2.8877 | 6.98 | 2.2634 | 0.672886 |
11176 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 5.91 | 2017-12-19 | ... | 2.8125 | 2.8357 | 2.8593 | 2.8833 | 2.9075 | 2.9320 | 2.9566 | 6.91 | 2.41 | 0.550962 |
11177 rows × 43 columns
# drop掉包含nan的行
# transactions_zerocoupon = transactions_zerocoupon.dropna()
transactions_zerocoupon = transactions_zerocoupon[transactions_zerocoupon['zero_coupon'].notna()]
transactions_zerocoupon = transactions_zerocoupon[transactions_zerocoupon['yld_pt'].notna()]
transactions_zerocoupon
cusip_id | maturity | country_domicile | CUSIP_ID | trd_exctn_dt | trd_exctn_tm | yld_pt | year_month | time_to_maturity | date | ... | SVENY24 | SVENY25 | SVENY26 | SVENY27 | SVENY28 | SVENY29 | SVENY30 | year | zero_coupon | credit_spreads | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | 00077TAB0 | 2093-10-15 | USA | 00077TAB0 | 2015-01-05 | 14:27:45 | 6.525827 | 2015-01 | 78.78 | 2015-01-05 | ... | 2.5902 | 2.6174 | 2.6438 | 2.6694 | 2.6942 | 2.7183 | 2.7415 | 30.00 | 1.964 | 4.56183 |
22 | 00077TAB0 | 2093-10-15 | USA | 00077TAB0 | 2015-02-17 | 11:24:15 | 6.897267 | 2015-02 | 78.66 | 2015-02-17 | ... | 2.7596 | 2.7781 | 2.7953 | 2.8114 | 2.8264 | 2.8405 | 2.8537 | 30.00 | 2.0779 | 4.81937 |
23 | 00077TAB0 | 2093-10-15 | USA | 00077TAB0 | 2015-03-31 | 14:59:18 | 6.816467 | 2015-03 | 78.54 | 2015-03-31 | ... | 2.5792 | 2.6016 | 2.6228 | 2.6427 | 2.6616 | 2.6794 | 2.6963 | 30.00 | 1.8462 | 4.97027 |
24 | 00077TAB0 | 2093-10-15 | USA | 00077TAB0 | 2015-04-24 | 12:10:55 | 6.029157 | 2015-04 | 78.48 | 2015-04-24 | ... | 2.6637 | 2.6893 | 2.7135 | 2.7363 | 2.7578 | 2.7781 | 2.7972 | 30.00 | 1.8252 | 4.20396 |
25 | 00077TAB0 | 2093-10-15 | USA | 00077TAB0 | 2015-06-04 | 11:17:07 | 6.404942 | 2015-06 | 78.37 | 2015-06-04 | ... | 3.1048 | 3.1394 | 3.1731 | 3.2059 | 3.2379 | 3.2689 | 3.2991 | 30.00 | 2.2132 | 4.19174 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11172 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-08-30 | 10:53:27 | 2.730190 | 2017-08 | 6.21 | 2017-08-30 | ... | 2.7544 | 2.7875 | 2.8199 | 2.8517 | 2.8829 | 2.9135 | 2.9434 | 7.21 | 2.0507 | 0.67949 |
11173 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-09-26 | 13:56:25 | 2.858460 | 2017-09 | 6.14 | 2017-09-26 | ... | 2.7821 | 2.8139 | 2.8452 | 2.8762 | 2.9066 | 2.9366 | 2.9661 | 7.14 | 2.1521 | 0.70636 |
11174 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-10-24 | 12:57:12 | 2.955087 | 2017-10 | 6.06 | 2017-10-24 | ... | 2.9239 | 2.9557 | 2.9874 | 3.0191 | 3.0506 | 3.0819 | 3.1129 | 7.06 | 2.3256 | 0.629487 |
11175 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-11-22 | 9:27:49 | 2.936286 | 2017-11 | 5.98 | 2017-11-22 | ... | 2.7467 | 2.7706 | 2.7944 | 2.8180 | 2.8414 | 2.8646 | 2.8877 | 6.98 | 2.2634 | 0.672886 |
11176 | 98934KAB6 | 2023-11-15 | USA | 98934KAB6 | 2017-12-19 | 12:22:55 | 2.960962 | 2017-12 | 5.91 | 2017-12-19 | ... | 2.8125 | 2.8357 | 2.8593 | 2.8833 | 2.9075 | 2.9320 | 2.9566 | 6.91 | 2.41 | 0.550962 |
7548 rows × 43 columns
transactions_zerocoupon.to_csv('../data/export/part1e.csv')