Big Data in Healthcare Management: A Review of Literature

: A systematic literature review of papers on big data in healthcare published between 2010 and 2015 was conducted. This paper reviews the definition, process


Introduction
Healthcare Industry is one of the world's biggest and widest developing industries. During, the recent years the healthcare management around the world is changing from disease-centered to a patient-centered model [1] and volumebased to a value-based healthcare delivery model [2]. Educating the superiority of health care and decreasing the cost is a principle behind the developing movement toward value based healthcare delivery model and patient-centered care. The volume and demand for big data in healthcare organizations are growing little by little [3]. To provide effective patient-centered care, it is essential to manage and analyze huge health data. The outdated data management implements are not sufficient enough to analyze big data as variety and volume of data sources have increased in the past two decades. There is a need for new and innovative big data tools and technologies that can meet and exceed the ability of managing healthcare data [4]. Research study predictions on the worldwide big data expenditure in the healthcare business to progress towards Compound Annual Growth Rate (CAGR) of 42% during this years 2014-2019 [5].
The big data are used to predict the diseases before they emerge based on the medical records. Many countries' public health systems are now providing electronic patient records with advanced medical imaging media [6]. The practice of big data takes the prospective to encounter the upcoming market needs and trends in healthcare establishments [7]. Big data provides a great opportunity for epidemiologists, physicians, and health policy experts to make data-driven judgments that will eventually develops the patient care [8]. The authors have used Google trends for analyzing the 'big data in healthcare' between 2010 and 2015. The resulting graph is shown in figure 1. Google trends is a unique, liberally available online portal of Google Inc. that permits consumers to cooperate with Internet hunt data, which may provide profound visions into peoples activities and health-connected occurrences. Google flu trends and Google dengue trends are used for nowcasting the spread of diseases like flu and dengue. Google trends have been used in several research publications [9].
It can be seen from Figure. 1 that the term 'big data in healthcare' really took off around early 2013. Increase in interest in this term can be related to a popular report by McKinsey & Company that came out in early 2013 [10]. The report highlights that healthcare expenses contributes about 17.6% of GDP and have a potential to reduce healthcare spending by $300 billion to $450 billion.
Currently, most big data studies in healthcare merely concentrate on technological understanding of big data [11]. Only a few topics had been debated the big data analytics in healthcare Information Technology [12]. The main concept of this is to review the processes and applications of big data in healthcare management.
This content is structured as follows: Section-2 introduction to the definitions and different dimensions of big data in healthcare. Section-3 discusses the search strategy and the steps in literature review of articles dealing with 'big data in healthcare'. In section-4, the process of big data management in healthcare is presented. Section-5 discusses the applications, benefits, and challenges of big data in healthcare. Since the privacy issues have been increasing recently, the laws related to protection and management of health and medical data are also discussed in this section. Finally the content finishes with conclusions and recommendations for upcoming investigation in the section-6.

Definition of Big Data
There are many definitions that have been provided by researchers for big data [13][14], however no sole definition of big data is generally known [15]. Baro et al [16] have done extensive research on the definition of bigdata and have proposed that a dataset could be qualified as "big dataset" only if Log (n * p) is superior or equal to 7. Report submitted by U. S. Congress in August 2012 explains big data as "large volumes of high velocity, complex, and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information" [17]. In this content, we propose a new explanation of big data in healthcare.
The big data in healthcare involves collecting large collections of data from various healthcare foundations followed by storing, managing, analyzing, visualizing, and delivering information for effective decision making.
The big data in healthcare is associated with six characteristics viz., volume, variety, velocity, veracity, variability, and value. Various researchers have discussed first three V's (volume, variety, and velocity) that are widely used for characterizing big data in their papers [18][19]. The remaining three V's (veracity, variability, and value) are also discussed extensively by several researchers [20][21]. These six V's of big data are depicted visually in Figure 2. Volume denotes to the large quantity of data produced by the organization. Today healthcare data are in terabytes (10 12 bytes), petabytes (10 15 bytes) or Exabyte's (10 18 bytes)" [22]. In future the vast entirety of clinical data records will increase to zettabyte (10 21 bytes) or yotta byte (10 24 bytes). Such huge amount of data creates storage and massive analysis issue.
Variety arises from broadly contrasting bases of data or mash-ups of data resultant from autonomous sources in format [23]. The different format of data in healthcare can be classified as organized, semi-organized, or unorganized data [24]. Organized data includes laboratory data, clinical, sensor data and data from relational databases [25], semi-organized data includes data that is stored in Extensible Markup Language (XML) format [26], and unorganized data are free text data that usually does not have a precise design such as manual written note [27][28], data from X-ray images, radiological images and other medical imaging [29][30], Electronic Medical Record (EMR/HER) [31], graphics, patient discharge summaries, physiological measures (signals), healthcare data from social media and mobile phones [32][33]. 90 percent of big data are in the form of an unstructured data [34].
Velocity refers to the massive frequency during the current data is created, supplied and managed [35]. Velocity thus includes both equivalent the rapidity of data manufacture and the rapidity of data handling to meet demand. Accelerated increase of data is the third characteristic of big data [28,29,36]. The data that are generated can be either batch or real-time data [28]. The data's contents are frequently fluctuating through the concentration of corresponding data assemblies, the summary of previous data or inheritance gatherings, and the different forms of streamed data from multiple sources [25]. Example of velocity can be ageing of the population that is constantly increasing giving rise to number of patients, which increases growth rate of data by 55-60% every year [30].
Veracity refers to the correctness and accuracy of information [25]. Big data has low veracity, it can never be 100% accurate [37], and it is difficult to validate [38]. Since most of the data comes from unknown and unconfirmed sources, it is essential to set up a standard to ensure the feature of the data already it is involved.
Variability refers to data fluctuations throughout the handling and lifecycle. Developing range and variability also grows the attraction of data and the possibility in providing valuable information, unforeseen, and hidden [20].
Value is the method of extracting valuable information from huge sets of data and it is usually referred to as big data analytics [25]. Data value is useful for proper making decisions .
McKinsey & Company [10] believe that transforming of data deals with what is suitable or right healthcare environment and right for the for a patient They have considered following five strategic ways to significance, based on the concept that value is resulting from the balance of patient impact (outcomes) and. healthcare spend (cost).
Right living: Patients must be stimulated to be encouraged by taking an intimation part in their own health.
Right care: Patients must take the utmost timely, suitable treatment offered.
Right provider: Any professionals must have strong performance records who treats the patients and be skilled enough for succeeding the best results.
Right value: Suppliers and customers should frequently look for ways to expand value while maintaining or improving health-care quality.
Right innovation: Investors must concentrate on classifying new therapies and approaches to health-care delivery.

Methods
A complete literature analysis of article deals with 'big data in healthcare' related study were directed. A detailed search of publication of papers about big data in healthcare between 2010 and 2015 was used for review. The Table 1 includes the peer-reviewed journal articles from the large publishers' viz. Science Direct, PubMed, Springer, Taylor, & Francis, Inderscience and other reports on big data in healthcare. The detail search result yielded 10,496 papers. Our strategy for selecting the final review papers is explained in Figure 3. After reviewing these articles we discarded most of these papers. After removing irrelevant entries and duplications 573 papers were included for title review. After reading the title, 459 papers on big data that do not use the word healthcare in their title were excluded. Abstract reviewed by a human for eligibility, 36 papers were excluded since they were not directly related to healthcare and 78 papers relevant to healthcare full text criteria were reviewed. Finally the remaining 76 papers met all the inclusion criteria were retained. The Figure 3 shows the steps followed in search strategy.

Process of Big Data Analysis in Healthcare Industry
Big data analysis has the prospective to change the method of healthcare suppliers practice cultured equipment's to increase awareness from their clinical and other data repositories and make a declared conclusion. Big data healthcare analytics has five processes: Data Acquisition, Data Storage, Data Management, Data Analytics, and Data Visualization & Report. The Figure 4 presents the process of big data analysis in healthcare management.

Data Acquisition
The big data in healthcare can be in a format of the structured, semi-structured or unstructured [40] and can be acquired from primary sources (e.g., CPOE, clinical decision support systems, electronic health records etc.) and secondary sources (laboratories, insurance companies government sources, pharmacies, & HMOs, etc.) [6] the following are the most important sources of big data in healthcare.
Electronic Health Records: During the earlier time span the digitization of healthcare histories have delivered a basis for hospitals on medical datasets [41]. Electronic Healthcare Records data from Physician Notes, Lab report, ECG, Scan, X-RAY, Health sensor devices details, Medical prescriptions etc. These set of data are the foundation for personalized medicine and large cohort studies [42] for hospitals.
Image Processing: Medical pictures are one of the cause of data for analysis. Computed tomography (CT), photo acoustic imaging, ultrasound, molecular imaging magnetic resonance imaging (MRI), mammography fluoroscopy, positron emission tomography-computed tomography (PET-CT), and X-ray are some of the examples of imaging procedures that are well recognized within clinical settings [43].
Social Media: Healthcare data can be collected from social media like Facebook, twitter, LinkedIn etc. Social Medical logs data are usually used for analyzing disease spreading/ transmission [44]. For example collecting information about a particular flu affected people from Twitter is faster than traditional method. Social networking cites like patients like me (www.patientslikeme.com) has more than 200,000 patients and is tracking more than 1,800 diseases [45].
Smart phones: Apps are the most important sources of data in the area of health care self-management. Nowadays, smart phones have health related apps like pedometers, fit bit that produces lot of data from number of stairs climbed, steps walked, and calories burned. Another app Mood panda is used to measure the individual mood and also anything from mental, emotional, and physical to social and environmental aspects of daily life. We also have apps for diabetes management I BG Star to monitor the blood glucose system. These apps create lot of data every day which contributes to healthcare research. Van Heerden et al. [46] used mobile phones to collect maternal health information from HIV-Positive Pregnant Woman in South Africa. Zhang et al. [47] showed that smart phones can be effectively utilized for domestic data gathering on infant feeding in rural china.
Web base data: The websites are also one of the most important source of healthcare data. The popular websites proving healthcare data are 23andMe and uBiome. 23andMe is a DNA study service providing evidence and implements for individuals to learn about and explore their DNA (https://www.23andme.com). uBiome is a Microbiome sequencing service that offers facts and tools for you to discover your micro biome (http://ubiome.com) [48].

Data Storage
The storage plays an vital part in big data. As a size of data in the healthcare industry is increasing we need an efficient and large storage platform. With such large cloud is the most promising technology. Clouds provides elasticity and proficiencies for get into data, creating awareness, accelerating the potential for scalable analytics solutions and driving value. Cloud computing is a powerful and promising technology to store enormous scale of data and perform enormous-scale and complex computing. It eradicates the need to sustain costly computing hardware, software and dedicated space.
Using cloud to examine big data in healthcare makes sense because: Investments in big data investigation can be important and drive a need for efficient, cost-effective infrastructure.
Big data in healthcare organisations is a mix of internal and external foundations as they frequently keep patient's most delicate data in-house, enormous volumes of big data generated by public providers and third-party may be located outside.
Hashem et al. specified that cloud computing structure can assist as an actual platform to address the data storage essential to accomplish big data analysis [49]. Table 2 describes different platforms for cloud storage. It also describes their respective vendors and tools with purpose, advantages, disadvantages and application in healthcare. Cloud deployments for clinical applications with isolated or hybrid clouds provided these uses get maximum level of security, privacy, and availability. Nonclinical applications are a better fit for public arrangements but still must be wisely assessed.

Data Management
Data Management in healthcare includes organizing, cleaning, retrieval, data mining, and data governance. It also includes the method of validating whether there is some scrap data or any missing values. Such data needs to be removed [50]. It helps in risk assessment of patients, personalized discharge plan. Major data management tools are Apache Ambari and HCatalog. These tools are explained in detail in table 3. Data retrieval is a process of extracting file or valuable information from large healthcare databases. Big data analysis in healthcare frequently contains information recovery and data mining [51]. Wang et al. [52] mention that "information retrieval is the process of searching within large document collections, and in healthcare it mainly covers medical text retrieval and medical image retrieval. Data governance refers to overall management of security, integrity, usability and availability of the data employed in an enterprise. Maintaining confidentiality of individual patient records is very important in Healthcare management. This section provides information on key legislation to healthcare data accessibility and Government regulations intended to address health data privacy. The important data governance Act are HITECH, HIPPA, HDI, GINA and FOIA.
Health Information Technology for Economic and Clinical Health (HITECH) Act declared as portion of the American Recovery and Reinvestment Act of 2009, was engaged into law on February 17, 2009, to encourage the acceptance and significant use of health information knowledge. The implementation of HITECH Omnibus rule takes data management to a new level by mandating technological aspects of maintaining the patient record confidentiality. Subtitle D of the HITECH Act addresses the security and privacy concerns related with the electronic broadcast of health information, partly, through numerous requirements that reinforce the civil and illegal application of the HIPAA rules. HITECH involves all healthcare providers, to implement electronic healthcare system [53].
Health Insurance Portability and Accountability Act of 1996 (HIPAA) was considered to deliver secrecy of ethics to safeguard patients' health record and other medical information related to hospitals, doctors, health plans and other healthcare workers. HIPAA Title II also known as the administrative simplification provision which leads to the U. S. Department of Health and Human services to create national standards for handling electronic healthcare communications.
The HIPAA regulation is to declare that individuals' health information is appropriately secured however permitting the movement of health information required to deliver and promote high value of health care and to safeguard the populace's health and subsequent regulations direct covered entities (e.g., a healthcare institutions) in the safety of individually recognizable secure the health information in research [54].
Department of Health and Human Services (HHS) under the Health Data Initiative (HDI) proposed a reliable privacypreserved medical recommendation system. In this medical system, the patients can contribute their secured ratings of the physicians on different health conditions based on their satisfactions [55].
The Genetic Information Nondisclosure Act of 2008 (GINA) forbids health strategies and issuers from utilizing genetic data to make underwriting, premium-setting decisions, suitability or coverage [The Genetic Information Nondisclosure Act (GINA

Data Analytics
Data analytics is a process of transforming the raw data into information. Big data analytics in healthcare is classified into Descriptive, Diagnostic, Predictive, and Prescriptive Analytics [39]. The present research in academia and industry displays that retailers can attain up to 15 to 20% growth in ROI by setting big data into analytics [56].
Descriptive Analytics: It looks at past performance based on historical data. Descriptive Analytics is also known as unsupervised learning. It summarizes, What happened in the healthcare management? What is the impact of a parameter on the system? Diagnostic Analysis: Using historical data predicts the root cause of problem and diagnose Why did it happen? Predictive Analytics: It analyzes both real-time and historical data, also known as supervised learning. It can only forecast what might happen in the future, because all predictive analytics are probabilistic in nature. It cannot predict the future. It anticipate What will happen? What are the future trends? What is the decision based on past history? Prescriptive Analytics: This analytics automatically synthesizes big data and provide advice on number of different possible outcomes before the decisions are actually made. The decision maker can take this information and execute. Prescriptive analytics is advanced than the descriptive and predictive analytics. It prescribes What should we do? What is the best outcome and how can we make it happen? Table 3 describes the tools which are being used to manage healthcare data. There are many analytical tools to analyze the data. We have classified the tools into 8 different layers. Each layer is further subdivided into components.

Data Visualization
Data visualization is presenting the analytic results of healthcare data into pictorial or graphical format for understanding complex data and better decision making. It can be used to understand pattern and correlation among the data. The table 4 explains the tools that are currently being used for visualizing the big data in healthcare. The table describes description and features of the tool with its application in healthcare. Table 4. Visualization tools used in healthcare.

Tool Description Features Applications References
R R is a programming -graphical display, control and manipulation of data -R is been use by pharmaceutical company where it is used to plan clinical trials, and to forecast the finale day of the conclusions built on prearranged temporary studies of the data.
https://www.r-project.org/ Language for advanced statistics and data visualization.
-Combined tools for instantaneous analysis -Insurance concern practices R to build analytical models to set calculate risk profiles and premiums. The table contains 5 tools that can be used in future for visualizing healthcare data with their description, features and related link. Table 5. Visualization tools can be used in healthcare.

Tools Description Features
Nodebox It's a tool used for generative creative designs -Data have been imported in various formats such as excel and in addition to that it is animation capable https://www.nodebox.net/ Flot Flot is pure Javascript conspiracy collection for jQuery wit, attractive and collaborating, humble structures as focused.
-Helps in plotting categorical and textual data and usage of combinations for displaying elements in the same data series.
-Produces interactive visualizations with toggling series http://www.flotcharts.org/ FF Chartwell FF Chartwell evolution is a simple series of numerical into editable data concepts.
-For larger infographics the modules made are useful.
-Use simple data series to produce charts and graphs https://www.fontfont.com/

Raphael
Raphael uses SVG and VML creating vector. A JavaScript library on the web for graphics, so that each representations formed is also a DOM object.
-Multi-chart skills -Create a range of graphs, charts and other data conceptions http://raphaeljs.com/ Cross filter A JavaScript library that is skilled at controlling data sets by in excess of a million records. Discovering huge -Discover huge multivariate datasets -Speedy incremental reducing and clarifying

Tools Description
Features multivariate data sets in a browser is made potential by Cross filter -Increases presentation of live histograms http://square.github.io/crossfilter/ Google Charts Google Charts offers fluctuating from basic distribute plots to categorised tree maps, a selection of data visualization formats.
-Cross-browser compatibility -Cross-platform portability. https://developers.google.com/chart/ Nephi Nephi allows users for difficult enquiry of links, and more for a better understanding of data relationships, and to both visualize and discover data social networks -Real-time visualization -Deep data analysis to study relationships -In-built 3D version machine https://gephi.github.io/ Tableau Public Tableau is a flexible-to-practice device for crafting, communicating, insert and data visualizations them on your website.
-Desktop presentation but accomplished graphics are kept on a public server -Drop and drag interface; -No programming skills required https://public.tableau.com/s/ Quadrigram Quadrigram allows users to create entirely personalised visualizations using their own data and several mechanisms from a built-in library of entirety from charts and graphs to quadrification and loaded flow.
-Drafting the ideas and generating speed prototypes -Quick data processing by cloud-based computing system -Complete library of interactive visualizations -Build animations, dashboards and more http://www.quadrigram.com/

Perfuse
Perfuse is a data visualization tool which had been utilized by the IBM Visual Communication -Drag-and-drop constituents to figure the visualizations -Upload your data in CSV or Excel format -Share publicly or privately -Sandboxes for analysis and sales data http://visualizefree.com/ Table 6 describes all the tools that need programming knowledge to visualize healthcare data.

Applications of Big Data in Healthcare
Big data can be applied in almost all the areas of healthcare management. The potential application areas are fraud detection, epidemic spread prediction, Omics, clinical outcome, medical device design, insurance industry, personalized patient care and manufacturing, and pharmaceutical development etc. [20]. Moreover the application of big data is widely adopted in personalized healthcare which offers an individual centric approach [58].

Applications of Big Data in 'Omics'
"Omics'' data refer to significant datasets in the organic and molecular fields (e.g., proteomics, metabolomics macrobiotics, genomics etc.). Application of big data on this study is to realize the strategies of diseases and increase the specification of medical treatments (e.g. "precision medicine") [59]. With the advance in metabolomics, proteomics, genomics, and other types of omics know-hows through the previous eras, a remarkable volume of data associated to molecular biology has been formed [60].
Genomics is the study of genes and their functions [61]. Application of big data in genomics will help to prevent or cure diseases and delivering personalized care to each patient [62]. This area is in still emerging period with presentations in particular concentrated regions, for example leukemia, diabetes, and cancer [63][64]. Pathway analysis is mostly used for high-quantity of genome-scale data [65], there are three generations of same structures used in pathway analysis [66]. The first generation tools are Clue Go, Onto-Express and GoMiner [67]. The most popular tool for second generation is GSEA [68], and the example for the third generation tool is Pathway-Express [69].
Proteomics is the study of proteome on their structures and functions. A proteome is the entire set of proteins in a cell. ExPASy (http://www.expasy.org/proteomics) lists dozens of databases on proteomics and over 100 tools. Big data application in proteomics, will have a major role in predicting and preventing human cancer [70]. Find Mod [71] and CSS-Palm [72] are frequently used for PTMs prediction.
Metabolomics is the systematic concept of chemical procedures including metabolites. The database BiGG used Genomic-based reconstruction of human metabolism for systems biology [73].

Insurance Industry / Payer
Healthcare Insurance companies/ payers are using big data in underwriting, fraud deduction, and claim management. Insurance providers are observing further than algorithmic fraud revealing practices that are claim-centric, to ones that are person-centric [74]. For example how many related claims were been submitted by the same personality or stated the identical treatment in different insurance companies.

Medical Device Design and Manufacturing
Big Data implement facilitates a wider set of device materials, delivery methods, and tissue interactions, anatomical configurations to be evaluated. Calculation techniques and Big Data can plays a significant role in medical system strategy and manufacturing [75].

Pharmaceuticals
Big data is used during all phases of pharmaceutical development, particularly for drug discovery [57]. Pfizer has recently initiated Precision Medicine Analytics Environment program that associates the dots among electronic medical record data, clinical trial, and genomic to identify chances to rapidly convey innovative medicines for particular patient populations.

Personalized Patient Care Healthcare
Big Data will make possible to bring best and modified patient care. In nearby future, fresh big data-derived influences will prompt suitable updates of diagnostic assistance, clinical guidelines and patient triage to permit more particular and modified treatment to advance medical result for patients (Yang et al., 2014)

Conclusions and Future Research Directions
This study reviewed the literatures on big data in healthcare. The works gap in the terms of method is recognized and connected with the empirical study.
The most challenge parts for big data in healthcare are data privacy, data leakage, data security, efficient handling of large volumes of medical imaging data, information confidentiality and security, wrong use of health data or failure to safegourd the healthcare information, and understanding unstructured clinical notes in the right context, extracting potentially useful information [76].
Some limitations have been identified after review, firstly there are limited publication that are available on big data in healthcare management. This reading addressed the major experiments on data governance in healthcare. More detail study on data governance component can be future research topic.
The authors suggest few new data visualization tools to the healthcare analyst to make effective decision making. Big data has a great prospective to progress healthcare management and transform healthcare industry to next level. The review article will be benefiting the healthcare academicians, practitioners, researchers who are engaged in the areas of healthcare Management.