Fuzzy regression model with Bayesian approach and its application to public health data

The application of the Bayesian Linear Regression (BLR) and Fuzzy Bayesian Linear Regression method through the SAS algorithm is the focus of this paper. As an alternative method of data analysis in biostatistics, this modified method can be used. This modified method includes a bootstrapping technique, residual normality checking and some Bayesian Linear Regression Modeling (BLR) enhancement through Fuzzy Bayesian Linear Regression. We illustrated the application of the algorithm for Bayesian Linear Regression (BLR) and Fuzzy Bayesian Linear Regression in this paper.


Introduction
Bayesian Linear Regression (BLR) analysis is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference. This technique can be applied to forecast the value of the response variables (dependent) when given any value of the predictor variables (independent variables). A general regression model is given by , where i =1,2,3,…,n denoting an observation of a subject. is the response variable and is a vector of independent variables.
is the expectation of conditional on , and is the error term. This paper provides an algorithm for Bayesian Multiple Linear Regressions (BMLR) in SAS (Diem Ngo & La Puente, 2012). The application of the Bayesian Linear Regression (BLR) and Fuzzy Bayesian Linear Regression method through the SAS algorithm is the focus of this paper. As an alternative method of data analysis in biostatistics, this modified method can be used. This modified method includes a bootstrapping technique, residual normality checking and some Bayesian Linear Regression Modeling (BLR) enhancement through Fuzzy Bayesian Linear Regression. We illustrated the application of the algorithm for Bayesian Linear Regression (BLR) and Fuzzy Bayesian Linear Regression in this paper. Assume a BLR model where the response vector y has dimension and follows a multivariate Gaussian distribution with mean and covariance matrix , where the design matrix has dimension , contains the p regression coefficients, is the common variance of the observational and I is an identity matrix. That is, . In the Bayesian approach, the data are supplemented with additional information in the form of a prior probability distribution. The prior belief about the parameters is combined with the data's likelihood function according to Bayes theorem to yield the posterior belief about the parameters and (Gelman et al., 2013;Gelman & Hill, 2006). Data transformation + tools are commonly used to improve the normality of distribution and equalizing variance to meet assumptions and improve effect sizes, thus constituting important aspects of data cleaning and preparing for statistical analyses. The traditional transformations that are commonly discussed include: adding constants, square root, converting to logarithmic scales, inverting and reflecting, and applying trigonometric transformations such as sine wave transformations (Osborne, 2010). The study uses Box-Cox transformation. The form of Box-Cox transformation is as below.
where, y is the observation data and is the model parameter. The optimal value of was determined and this study used, = 2. The example for the application of the method discussed by using SAS language computer software is provided (Osborne, 2010). The bootstrap methods begin with original data or sample that is taken from the population, then calculated as sample statistics. The next step is to copy the original sample several times to create a pseudo-population with replacement by using the empirical density function (EDF), (Efron, Bradley and Tibshirani, 1993). The benefit of using bootstrap is its capability to develop a sample the same size of the original, which may include an observation several times while omitting other observations. The bootstrap method draws the samples with replacement and calculates statistics for each sample (it stores these statistics and creates a distribution for further analysis). After finalizing the bootstrap, the data is analyzed for mean, standard deviation, confidence intervals, and any other evidence of replication (Cassel, 2010;Jung, Jhun, & Lee, 2005;Higgins, 2005). In applying the bootstrap method, the original findings from the empirical test were replicated several times to meet the research requirement. As an example, for 1000 observations (original data), the analysis is performed by using a statistical linear model. The analysis results of beta coefficients and rsquared are obtained, followed by the application of the bootstrapping method to the selected data. In applying the bootstrap method, a sample of 23 observations was replicated 6 times (this is equal to 115 observations). The analysis from the statistical linear model, the beta coefficients, and rsquared values of the bootstrap method was compared to the original results. The bootstrap method findings depict the average beta coefficients and r-squared values that are similar to the original findings, from where it was replicated. Interestingly, the bootstrap method provides another noble opportunity for a further comprehensive study of science and non-science discipline. A fuzzy regression model can be written as , here the explanation variables are assumed to be precise. However, according to the equation above, response variable Y is not crisp but is fuzzy, the same which also applies to the parameters. We aim to estimate these parameters. In further discussion, are assumed as symmetric fuzzy numbers which can be presented by intervals. For example, can be expressed as a fuzzy set given by where is center and is radius or vagueness associated. The fuzzy set above reflects the confidence in the regression coefficients around in terms of symmetric triangular membership functions. The application of this method should be given more attention when the underlying phenomenon is fuzzy, which indicates that the response variable is fuzzy. T, the relationship is also considered to be fuzzy. This can be written as with and . The simple procedure is commonly used to solve the linear programming problem. (Kacprzyk & Fedrizzi, 1992). Data for this study is a sample which is composed of four variables.

Sample size determination
The sample size for multiple regression analysis was calculated by using G*power with effect size = 0.15, 0.05, power of the study = 0.80, and the number of predictors were three. The minimum sample size requires is 77 respondents.  Waist Waist circumferences

Algorithm and flow chart for modified Bayesian linear regression analysis method
The algorithm for modified Bayesian linear regression analysis method is presented in Figure 1.

Results from Bayesian multiple linear regression
The results from Bayesian multiple linear regression is presented in Table 2.
Upper or lower limits of prediction interval are computed from the prediction equation (2.1) by taking the coefficient as their corresponding estimated values plus or minus standard error (See Table 3 Table 4.  The width of prediction intervals concerning the Bayesian multiple linear regression model and the Bayesian fuzzy regression model corresponding to each set of observed explanatory variables is computed in SPSS and the results are reported in Table 5. From this table, the average width for the former was found to be 0.005600, while that of the latter was only 0.003170, thereby indicating the superiority of fuzzy regression methodology.

Conclusion
This paper presents an algorithm and illustrated the procedure of modeling by using modified Bayesian linear regression through SAS language. Our aim is to share the algorithm and also provide the researcher with an alternative programming that suitable for a small sample size. This proposed method can be applied to small sample size data, especially when limited data is obtained, for example in public health.