Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders
Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders
Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders
Ebook1,470 pages22 hours

Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Building upon his earlier book that detailed agile data warehousing programming techniques for the Scrum master, Ralph's latest work illustrates the agile interpretations of the remaining software engineering disciplines:

  • Requirements management benefits from streamlined templates that not only define projects quickly, but ensure nothing essential is overlooked.
  • Data engineering receives two new "hyper modeling" techniques, yielding data warehouses that can be easily adapted when requirements change without having to invest in ruinously expensive data-conversion programs. 
  • Quality assurance advances with not only a stereoscopic top-down and bottom-up planning method, but also the incorporation of the latest in automated test engines. 

Use this step-by-step guide to deepen your own application development skills through self-study, show your teammates the world's fastest and most reliable techniques for creating business intelligence systems, or ensure that the IT department working for you is building your next decision support system the right way.

  • Learn how to quickly define scope and architecture before programming starts
  • Includes techniques of process and data engineering that enable iterative and incremental delivery
  • Demonstrates how to plan and execute quality assurance plans and includes a guide to continuous integration and automated regression testing
  • Presents program management strategies for coordinating multiple agile data mart projects so that over time an enterprise data warehouse emerges
  • Use the provided 120-day road map to establish a robust, agile data warehousing program
LanguageEnglish
Release dateSep 19, 2015
ISBN9780123965189
Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders
Author

Ralph Hughes

Ralph Hughes, former DW/BI practice manager for a leading global systems integrator, has led numerous BI programs and projects for Fortune 500 companies in aerospace, government, telecom, and pharmaceuticals. A certified Scrum Master and a PMI Project Management Professional, he began developing an agile method for data warehouse 15 years ago, and was the first to publish books on the iterative solutions for business intelligence projects. He is a veteran trainer with the world's leading data warehouse institute and has instructed or coached over 1,000 BI professionals worldwide in the discipline of incremental delivery of large data management systems. A frequent keynote speaker at business intelligence and data management events, he serves as a judge on emerging technologies award panels and program advisory committees of advanced technology conferences. He holds BA and MA degrees from Stanford University where he studied computer modeling and econometric forecasting. A co-inventor of Zuzena, the automated testing engine for data warehouses, he serves as Chief Systems Architect for Ceregenics and consults on agile projects internationally.

Read more from Ralph Hughes

Related to Agile Data Warehousing for the Enterprise

Related ebooks

Databases For You

View More

Related articles

Reviews for Agile Data Warehousing for the Enterprise

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Agile Data Warehousing for the Enterprise - Ralph Hughes

    2015

    Chapter 1

    Solving Enterprise Data Warehousing’s Fundamental Problem

    Data warehouses used to be too expensive and take far too long to build. Agile data warehousing techniques honed during the past 15 years have solved this problem. At their root, agile data warehousing methods incorporate practices such as Scrum or Kanban to accelerate programming, but this strategy alone is not enough because poor software engineering practices in the phases leading up to or following application coding can fatally undermine an enterprise data warehousing (EDW) project. Agile EDW teams must also utilize new, incremental approaches to requirements management, data modeling, and quality assurance. Generic agile techniques will not suffice in these areas because EDW applications have multi-layer data architectures and encounter cross-organizational challenges while defining the company’s data and metadata. EDW requirements must represent stakeholders at least three levels within the company; data modeling must draw upon hyper modeled designs; and quality assurance must address a large matrix of test types, architectural layers, and stakeholder groupings.

    Keywords

    Enterprise data warehousing; business intelligence and data analytics; requirements management; data modeling; quality assurance; agile methods; Scrum and XP; Kanban; lean software development; Rational Unified Process

    Let me open this book with an extraordinary claim: After 30 years, we have finally solved the fundamental problem of enterprise data warehousing. This fundamental problem can be stated simply as In theory, an enterprise data warehouse can be extremely valuable to the sponsoring organization, but in practice one cannot be implemented quickly enough or at a cost that company executives consider reasonable. People like the idea of an enterprise data warehouse (EDW)—a shared repository of standardized and trustworthy information on company events and circumstances, integrated across the many business units within the corporation. What they do not like is that they must wait the better part of a year and invest millions of dollars, only to receive a disappointing small subset of the capabilities they expected. When pursued with a traditional software engineering approach, enterprise data warehouses simply take too long and cost too much to build. With the agile techniques presented in this book, I believe that we have solved that problem.

    I have been working in data warehousing since the early 1980s, in roles ranging from extract, transform, and load (ETL) programmer to business intelligence (BI) developer, integration tester, lead designer, project manager, and, more recently, program architect. During the first 15 years of my career, the EDW projects I joined or led were managed using traditional project management techniques. Like many software efforts in that era, these data warehousing projects proved to be so protracted and stressful that they disappointed both the developers and the customers when many of the promised features had to be dropped to meet time and budget constraints. Though my teammates suggested that all large projects naturally experience such challenges, I wondered why we as an industry were not improving our performance as the years went by. Project managers were certainly introducing far more monitoring and control into the methods we employed, but if anything, the project outcomes were getting worse.

    I started to see that the EDW development profession had fallen into a negative feedback loop, and that this downward spiral was actually the cause of data warehousing’s fundamental problem. As shown in Figure 1.1, this feedback loop begins with the perception that EDW applications are large, complex, and therefore risky to build. We fear failure, so we adopt a plethora of extremely risk-adverse engineering and project management practices that make our developers’ task lists considerably longer. The tasks themselves become more difficult to complete due to all the audits and reporting steps that project management requires in order to know that the process is on track. Unfortunately, these longer task lists make the EDW development project even more complex and all that more likely to fail. The higher price tag of the task list and the increasing failure rates heighten the EDW’s perceived risk, driving everyone involved into another lap around the fear circle. After a few cycles of this negative feedback, the development process has become so riddled with controls and audits that one wonders how the programmers will be able to get any significant work completed at all.

    Figure 1.1 The negative feedback loop present in most traditionally managed projects.

    The Agile Solution in a Nutshell

    The agile software development movement that started in the early 2000s solved a very similar problem, though it was geared toward the programming of transaction capture systems—that is, non-data warehousing applications. The highlights of the generic agile software development strategy consist of the following:

    • Progressive decomposition of requirements to generate a simple list of the programming task

    • Co-located, self-organized teams of developers

    • Iterative programming techniques that deliver small slices of the application every couple of weeks

    • Frequent review of those small slices by one or more members of the end-user community

    Many data warehousing teams attempted to utilize this incremental delivery approach, but for a long time they struggled to perform as well as agile developers building transaction-capture systems. This early difficulty was largely due to the fact that data warehouses differ from transaction-capture applications in two crucial ways. First, they have data architectures with two to four times as many layers as transaction systems, often with each layer requiring its own data modeling strategy, a different flavor of data transforms, and even a unique development tool set. It turns out that constructing a data warehouse is like building three to eight separate transaction systems at once.

    Second, an EDW’s data repositories amass billions if not trillions of records. The initial data load required to put the data warehouse into production usage often runs for many days or weeks. When the warehouse’s design must change, the development team can be forced to scrap large portions of the data already captured and repeat the long initial data load. Moreover, if the source for that data is no longer available, the team must then invest hundreds of hours writing, running, and validating conversion scripts that retrofit millions of data records to comply with the warehouse’s new design. Evolving an existing data warehouse is like dragging a ball and chain through a swamp.

    This double challenge of building and evolving a data warehouse lies at the heart of the fear-drive failure cycle in our profession. Because a single oversight in requirements and design could invalidate months of programming or require weeks of frantic data conversion, data warehousing professionals believed they can not employ agile’s iterative and incremental approach. All requirements have to be identified before design work can begin, and the design must be complete and bulletproof before programming can start. Without an incremental delivery strategy, the EDW profession remained mired in the negative feedback loops that agile teams building transaction-capture systems escaped long ago.

    The solution to the data warehousing predicament emerged only in the past few years with the advent of incremental data modeling techniques. This new approach to designing a warehouse’s data schemas allowed large data repositories to be adapted for new designs after they are initially loaded—without requiring expensive reloads or conversion scripting. These new data modeling techniques worked from the inside out, to make the entirety of agile data warehousing suddenly feasible. Once a team could economically evolve a data warehouse, it was then free to design incrementally, and consequently its analysts could detail requirements a chunk at a time. The big, complete, and perfect specification up-front was no longer necessary. Although a good overall vision for the project is still necessary, by and large data warehousing teams can program and deliver an enterprise business intelligence application one piece at a time. They can readily steer their programming efforts to address many more of their customers’ short-term goals, making EDWs far more responsive to business needs—making them, in fact, agile. Considerable thought and innovation are still required to adapt all of the software engineering processes besides programming to the peculiarities of data warehousing. However, that remaining work proves to be fairly straightforward now that the data engineering component has been solved.

    Five Legs to Stand Upon

    In the past 15 years, I have worked with agile teams that have steadily adapted iterative, incremental development techniques to meet the demands of large, data-driven applications such as enterprise data warehouses. These adapted agile practices have certainly accelerated EDW delivery speeds, frequently by a factor of two or three. More importantly, these new agile techniques for EDW have kept the business sponsors and project stakeholders solidly in the loop, providing frequent reviews of crucial design decisions as each new component is coded. Such frequent business reviews regularly catch misconceptions regarding requirements and design, keeping the development effort intently focused on the features essential for project success and eliminating ill-conceived programming objectives that would have only wasted time and resources. By largely eliminating the risk within large EDW projects, the techniques remove the fear that used to drive us to the specification- and process-heavy project management styles that formerly doomed our applications to failure.

    Unfortunately, thousands of data warehousing programs throughout the world still suffer from the waste and frustration forced on them by the fear-driven death spirals. The mission of this book is to illustrate the alternative strategies and techniques that agile enterprise data warehousing teams utilize for building large, data-driven applications. I hope that with the agile EDW approach well documented sponsors, stakeholders, and development team leaders can successfully advocate that their companies switch to an incremental, risk-mitigating approach for their next data warehousing project.

    The full practice of agile enterprise data warehousing is a large assembly of principles and techniques. The practitioners of agile enterprise data warehousing derived this collection over many years by borrowing pieces from four different agile methods: Scrum, XP, Kanban, and RUP. We also incorporated a few old-school disciplines from management information science, such as requirements management and quality assurance. By merging these multiple influences and sharing our experiences with each other, our community of DW/BI professionals has arrived at what I consider a baseline approach to enterprise agile data warehousing.

    This baseline approach consists of five major elements, as illustrated by the mind map in Figure 1.2. These adapted software engineering discipline represent the five legs that the full agile EDW method stands upon:

    1. Iterative, incremental application coding (AC) techniques that provide not only faster delivery speeds but also significant risk mitigation

    2. Streamlined requirements management (RM) that makes the work of defining a project quick and focused

    3. Adaptive data engineering (DE) skills that allow a warehouse’s data repository to be built incrementally, then economically revised as requirements change, even after it has been loaded with data

    4. Balanced quality assurance (QA) efforts that instill test-led development at all levels of project work

    5. Several productivity tools organized into a repeatable value cycle (VC) for creating incremental subreleases that amplifies the ability of the other four elements to accelerate deliveries and mitigate risk

    Figure 1.2 The five major components to agile enterprise data warehousing.

    This book steps the reader through each of these components and thus serves as a field manual for DW/BI development teams, both those that are just getting started and those that are seeking ways to bring new life to a struggling project. Putting these five legs to work gives even the largest enterprise data warehousing programs incredible traction against the challenges they must conquer—challenges such as uninvolved business partners, incomplete and inconsistent project definitions, rigid data models, and poorly coded application modules. Incorporating these five adapted disciplines allows agile EDW teams to steadily chip away at the unknowns in both business and technical requirements, translate them into lists of actionable development tasks, and steadily deliver a growing collection of user-validated features and performance capabilities.

    These agile practices convert the entire EDW development experience into a far more understandable and predictable process for everyone involved, including project sponsors and business stakeholders. The net result is a spiral that operates in the reverse manner of the cycle diagrammed previously. As depicted with Figure 1.3, agile EDW project experiences a positive feedback loop. The desired application is still large and complex, but instead of specifying every last detail of the application before coding begins, the team decomposes the work into small increments that can be easily accomplished sequentially. As the team develops the modules for each increment, the business can validate both the new features they offer and how they integrate into an overall system. The enormity of the project transforms into a list of components that both the business and IT readily understand and that can be delivered one after the other without incurring serious risk. With such clarity and low risk, sponsors and project managers can lighten up on the audits and process controls, allowing the programmers to work far more quickly and judging the project’s progress by the working modules created.

    Figure 1.3 Agile EDW practices switch projects to a positive feedback loop.

    The Agile EDW Alternative is Ready to Deploy

    This book is designed with two audiences in mind: EDW sponsors and EDW project leaders. By EDW sponsors, I mean the executives and the representatives of a company department that is funding the development of a data integration application, perhaps with a BI or data analytics front end. These folks on the business side of a project need to realize that an agile alternative to traditional, failure-prone development methods exists. Understanding the nature and advantages of the agile alternative will empower these sponsors to insist that the development teams and project managers who work for them employ an incremental delivery approach.

    When referring to EDW team leaders, I am thinking of the members of the development group other than the programmers who build the data integration and BI components. This group includes roles that go by many names, including solutions architects, project architects, business analysts, data architects, data modelers, systems analysts, technical leads, and system testers. People who fill these roles on a team are usually veteran DW/BI developers and have most likely seen how EDW projects go wrong. Understanding the nature and advantages of the agile EDW method will enable these team leaders to identify methodological problems as they occur within a project and to articulate effective remedies, should they believe that their current project is slipping into a fear-driven death spiral.

    For the EDW sponsors, the message in this book can be summarized as a warning:

    The project managers working with the information technology (IT) department probably subscribe to an old-fashioned approach to running programming projects. The method they are planning to use to build your enterprise data warehouse has fundamentally misjudged the best way to mitigate the risk of large software development programs. Following their outdated methods, these project managers will lead your development project into a swamp of details and wasted effort from which your data warehousing program will never escape. Because their method is so risky and labor intensive, chances are you will never see half of the EDW features you were promised. Even the minimal data warehouse they will eventually deliver will prove to be impossible to adapt in a business-reasonable timeframe when new user requirements emerge.

    To save your program, you must convince IT to employ an iterative delivery method, such as the one presented in this book. By following agile enterprise data warehousing, your development team will be able to provide your company with world-class business intelligence in a fraction of the time, money, and frustration that traditional methods involve. Moreover, you will know throughout the project whether IT is truly achieving your goals. Agile EDW will rapidly provide the business intelligence your company needs to compete and thrive, and it will deliver this capability with far less risk.

    For EDW team leaders, the message of this book is an exhortation to see past the ossified software engineering approach that most of us have followed blindly for years:

    Try to see the risk of enterprise data warehousing from the project sponsors’ point of view, and realize that a delivery schedule measured in years makes no sense for a business analytics development program. Your company has to adopt a faster DW/BI delivery approach in order to compete effectively in the global marketplace and to survive. The risk mitigation strategies presented in this book are strategies rooted in new, agile approaches for requirements management, data modeling, and quality assurance. This combined approach offers a new way to work that delivers DW/BI systems far faster and with more effective safeguards against project failure. When your EDW sponsor says, IT has got to start delivering faster, better, and cheaper, tell the sponsor that you now have a new, agile method for achieving exactly that goal.

    Defining a Baseline Method for Agile EDW

    As a further purpose for this book, I hope to contribute to the notion of what a standard method for agile data warehousing might be. During the past 15 years, the consultants in my company and I have encountered a wide variety of iterative development practices that the people leading those efforts have all called agile data warehousing, even though many of them were clearly ineffective. To help companies avoid false starts in the future, I believe the community of agile EDW practitioners should settle upon a constellation of practices that they consider generally necessary and sufficient for a reliable incremental EDW development method. Such an outline of a standard agile data warehousing method would enable an existing development team to easily spot the gaps and misinterpretations of principles that undermine a team’s particular agile implementation. It would also sketch for a company wanting to go agile a proper series of steps for such a transformation, since the complete collection of practices is too large and involved for new teams to implement in a single pass.

    The agile methods available today are not really methods but instead high-level collaboration models. Accordingly, every agile EDW team that I have been asked to coach has derived its own, unique interpretation of iterative development. Variety among agile implementations is a perfectly acceptable result, given that agile principles encourage teams to self-organize and adapt the suggested techniques to meet their particular circumstances. Unfortunately, many of the homegrown implementations I have encountered were incomplete, sometimes grievously so. A good example is a telecommunications firm that invited me to help it because it realized it was practicing Scrum-But—regular sprints, as suggested by the Scrum textbooks, but without story conferences, task planning, product demonstrations, and iteration retrospectives.

    After seeing many incomplete implementations, I realized that companies that desire to adopt a truly effective agile practice need to do far more than just hire a Scrum master or Kanban coach for their projects. In order to succeed, agile developers must certainly master iterative programming techniques, but that achievement will only be the first step in their transformation to a high-performance team. A world-class agile development team must also develop or acquire solid adaptations of the remaining disciplines listed previously. So that teams do not fall into the Scrum-But trap of pursuing large EDW programs with only small fractions of the necessary disciplines in place, those of us who write, speak, and tweet about agile data warehousing could develop a shared notion of what a complete agile EDW implementation includes and thoroughly embed that concept into the advice we provide.

    My first two books focused mostly on just one of the five disciplines listed previously—the agile coding practices. They touched lightly on the details of requirements management and quality assurance but left the high-level organization of those disciplines unaddressed. They said little about adaptive data modeling and value-driven release cycles. This current book fills those gaps by describing the adaptations that my colleagues and I have derived for the four disciplines that should surround and support agile programming techniques. Because the agile community is constantly innovating, I am sure that should a standard method for agile EDW someday emerge, it will be significantly broader and better honed than the package of disciplines I have been able to present in my works. But I hope that my books will help the EDW profession to begin deriving a baseline agile method for our craft so that in the future, new teams can quickly arrive at development iterations that reliably achieve 90–95% of their objectives, month in and month out.

    Agile concepts are already so numerous and large that even within the space of three books, I believe I have been able to merely sketch the core practices that a DW/BI team would need. For the complete collection of practices, EDW leaders should draw from several other agile data warehousing books, such as Agile Analytics by Ken Collier, Agile Data Warehouse Design by Lawrence Corr, Agile Database Techniques by Scott Ambler, and Building the Agile Database by Larry Burns. EDW leaders will benefit also from recommendations found in the seminal works addressing general agile topics, such as Extreme Programming Explained by Kent Beck, Lean Software Development by Mary and Tom Poppendieck, Agile Estimating and Planning by Mike Cohn, and Scaling Software Agility by Dean Leffingwell. The wisdom and details that agile EDW team leaders need are already contained in these works. What I have tried to contribute with my work and this book in particular is to sketch in a single place what the overall package of necessary skills looks like and how the pieces can fit together, reinforcing each other and thereby yielding a sold, fault-tolerant, and extremely powerful approach.

    Although the body of knowledge for agile EDW is so large that it can be intimidating, EDW leaders should rest assured that it does not all have to be incorporated into a team’s practice at once. When my consultants and I start a new agile EDW program from scratch, we ask a customer’s DW/BI teams to start with only two of the five disciplines. Most of the teams focus on the agile coding method because it embodies many of the principles and philosophies that must be instilled eventually in all of the disciplines. As a parallel effort, we steer the data architects toward learning agile data modeling techniques so that the design undergirding the EDW program will allow frequent design revisions and incremental learning. Once the team members are fluent in incremental coding practices, we turn their attention to incremental requirements management because this discipline excels at defining small chunks of work that flow perfectly into an iterative programming process.

    When the team is ready for another transition step, we typically introduce incremental quality assurance so that the developers start receiving solid feedback on whether their agile requirements and coding are truly effective. We usually reserve the adoption of productivity tools for last so that the team’s preferred method determines the tools utilized rather than having the tools dictate how the team will work.

    Many people challenge my company for including the notion productivity tools as part of a method, but my colleagues and I have seen the methods of many teams evolve considerably once a tool eliminates hours of work inherent in a key development step. Whereas disciplines one through four listed previously can easily triple a team’s delivery pace, employing the tools can offer a second tripling in velocity, so it would be negligent not to give tools a place in the baseline agile data warehousing method. Readers will find that I treat the tools fairly generically in this book, so that the discussion remains firmly focused on how tools must align with a team’s preferred development process rather than sinking into a morass of details concerning how developers should employ the tools’ many features.

    By combining the disciplines outlined in this book, the additional reading I have recommended, and a light consideration of productivity tools, DW/BI team leaders should feel that they have a good baseline description of an agile method that will enable them to both plan the broad arc of an agile transition for their companies and regularly assess where methodological gaps and misinterpretations have hampered a current implementation.

    Plenty of Motivation to Go Agile

    The motivation to switch a traditional DW/BI department team to iterative techniques is easy to articulate: Agilizing a company’s approach to requirements, data modeling, and quality can improve by a factor of three an EDW program’s delivery speed and development costs. Not coincidentally, agilizing will also drive the defect rate for DW/BI enhancement toward zero, eliminating many risks and greatly increasing customer satisfaction. For teams that add the productivity tools now available, agile practices should allow EDW programs to deliver new business intelligence services with an order of magnitude less labor and time than required by traditional project management and software engineering practices.

    I provide evidence for this bold claim in the next few chapters, but first let us consider the impact that a significant acceleration in delivery speed can have for an organization’s EDW program. To put it succinctly:

    • Business intelligence contributes enormously to the fortunes of the companies we work for.

    • Delivering effective business intelligence does not have to be slow, expensive, and prone to failure.

    • Agile enterprise data warehousing offers an adaptable path to delivering quality business intelligence in one-tenth the time and cost of traditional software development techniques, greatly reducing the risk inherent in EDW programs.

    • Businesses that can reliably build decision support systems to answer crucial business questions in one-tenth the time will be the first companies to seize new business opportunities and will lead their industries’ cost curves downward.

    This reasoning is why agile data warehousing matters tremendously, and why I have dedicated three books to presenting the approach.

    Structure of the Presentation Ahead

    Given the crucial importance of the five legs for agile DW/BI, this book dedicates a set of chapters to each of them in the order listed in the mind map shown previously. Even at an introductory level, discussing methods and techniques that simultaneously affect delivery speed, project cost, and application quality could become an unwieldy presentation. Fortunately, one can understand the multiple components of an agile EDW approach by layering them inside out, much in the pattern by which teams would learn and implement these elements. Figure 1.4 shows this layered approach, and although this drawing depicts risk management as a separate component, in truth all the elements of the method reduce project risk by making EDW development faster, better, and cheaper. For that reason, risk mitigation will serve as a unifying theme that spans all the topics we touch upon.

    Figure 1.4 How a team might acquire agile EDW techniques working from the inside out.

    Part I introduces the agile coding techniques that lie at the heart of agile enterprise data warehousing. The agile coding techniques that my colleagues and I have derived from Scrum and Kanban were covered in detail in my previous two books, so this portion of the text will outline the topic only enough to allow readers who are new to iterative methods to gain a basic familiarity with this foundational material.

    Part II begins the discussion of how to employ agile techniques to reduce the risk of BI application projects, both large and small. I summarize the major adaptations to generic agile development methods that DW/BI teams must make to (1) accommodate the added complexity of multilayered data integration applications and (2) pursue the project with teammates who have several non-overlapping technical specialties. Embedded within that presentation are the definitions of the many terms for both traditional and agile development concepts that I employ throughout the remaining chapters. This analysis also illustrates how serious conceptual errors originate from three separate levels in DW/BI projects, and it then explains how agile thinking and iterative techniques drive those risks out of the projects that make up an EDW program.

    Part III outlines agile EDW’s twin approaches to requirements management. First, it discusses the lightweight style for requirements that is utilized by agile teams practicing methods such as Scrum and Kanban. This style serves as a foundation for agile projects and works well for smaller, data mart projects. The text then introduces a flexible, yet far more capable, requirements management system, which my company adapted from an older, more industrial-strength iterative method known as the Rational Unified Process (RUP).

    Part IV presents the new concept of agile data engineering, which incorporates hyper data modeling techniques. These innovative data modeling techniques enable data warehousing teams to start with small data repositories and evolve them later as requirements change, without incurring ruinously expensive re-engineering and data conversion costs. After reviewing the role that data virtualization and big data technology can play in an agile EDW program, the chapters in this part of the book present two styles of hyper modeling: hyper normalization and hyper generalization. Since re-engineering costs represent an enormous portion of the EDW’s total cost of development and ownership, these chapters compare the effort needed to re-engineer an EDW data schema using both traditional and hyper modeled design techniques so that readers can appraise hyper modeling’s cost-reduction potential for themselves.

    Part V focuses on planning an agile quality assurance effort for an enterprise data warehousing program. It first distinguishes between quality management, quality assurance, and quality control and then describes streamlined approaches to all three. It illustrates the effort needed to achieve the extensive progression and regression testing that fast-moving EDW programs absolutely require in order to deliver defect-free applications that delight their end users. This portion of the book also discusses automating the deep execution cycles that full EDW regression testing demands. Automating regression testing allows EDW teams to dedicate far more labor resources to adding new features to the BI applications instead of exhausting themselves by constantly re-validating what they have already built.

    Part VI unites the multiple components discussed previously into a single, eight-step value cycle for creating an EDW subrelease. Subreleases form an important part of the agile EDW risk mitigation strategy. The value cycle proposed for each subrelease will not only draw from the new techniques for requirements, design, and quality that are offered in this book but will also illustrate how to support data governance goals and incorporate the latest crop of productivity enhancement tools into a team’s iterative delivery approach.

    Given the many aspects of agile EDW that are contained in this book, Part VI concludes with short statements that both project sponsors and team leaders might employ to quickly orient everyone who is involved in these enormous projects to the new realities that incremental delivery methods engender. The short statement for the project sponsor manifests as an EDW Customer’s Bill of Rights, which distills what executives can expect from their DW/BI development teams now that a comprehensive agile method for data warehousing projects exists. For EDW team leaders, the short orientation statement I offer is an extension of the agile manifesto that includes the additional philosophies teams will need in order to meet the high expectations that the Customer’s Bill of Rights will inspire.

    Understanding agile techniques and using them to mitigate EDW program risk requires new thinking and ceaseless efforts to control a project’s or program’s use of time, expenditure of funds, and the quality of its deliverables. However, achieving 10-fold better utilization of company resources is a goal that makes the effort required to learn and implement new ways to work well worth the investment. In the past, the high risk inherent to EDW applications forced DW/BI departments to pursue their projects with extensive specifications up-front, despite the fact that such an approach is slow and prone to failure. This book attempts to clearly articulate the agile alternative so that those decision makers will have both the knowledge and the motivation to make a change for the better.

    Summary

    Traditional enterprise data warehousing projects easily fall into a negative feedback loop where fear of failure drives companies to instill so many checks and controls on the development process that delivery of value to business stakeholders slows to a crawl. To some extent these process bottlenecks can be corrected by switching to generic incremental programming methods such as Scrum and Kanban once those starter methods have been adapted for the additional complexity that data integration adds to a software development project. In order to deliver at maximum speed and with minimum risk, development teams will also need agile adaptations for the remaining components of the application development life cycle that wrap around the work of programming data transforms and front-end modules. Whereas my earlier books focused upon accelerating the work of programming business intelligence applications, this volume provides detailed guidance for fast and incremental approaches to the three remaining engineering disciplines that every EDW team must master: requirements management, database design, and quality assurance. It also describes how the latest productivity tools for data analytics, such as data virtualization, data warehouse automation, and big data management system, offer teams a new type of application development value cycle that dramatically reduces the amount of labor needed to design, build, and deploy each incremental version of an enterprise data warehouse. By following the suggestions provided in the chapters ahead, EDW project leaders such as solution architects, data modelers, and system testers can accelerate their team's delivery pace by a factor of three. Moreover, by incorporating the new breeds of productivity tools on top of those process improvements, EDW project leaders can triple again their team's delivery speed.

    Part I

    Summaries of Generic Agile Development Methods

    Outline

    Chapter 2 Primer on Agile Development Methods

    Chapter 3 Introduction to Alternative Iterative Methods

    Part I References

    Chapter 2

    Primer on Agile Development Methods

    Agile techniques for enterprise data warehousing (EDW) incorporate two methods directly related to the agile-manifesto: Scrum and Extreme Programming (XP). Scrum organizes a team’s development work into time boxes, and XP articulates many labor-saving programming techniques that the team can employ. Unfortunately, these two methods are so general that they require many adaptations to make them appropriate for building data warehouses. To adapt them successfully, EDW team leaders will have to go beyond just memorizing the prescribed agile practices to the point of embracing the values and principles undergirding each approach. Altogether, Scrum and XP incorporate nine philosophies, 26 principles, and 24 suggested practices. Knowing these underlying values and principles will equip EDW team leaders to judge whether a proposed policy or practice will enhance or hinder a team’s effectiveness, thus enabling them to effectively protect a team’s agility.

    Keywords

    Agile methods; Scrum and XP; values and principles; incremental and iterative programming; iteration planning; story conference; product demonstrations; sprint retrospectives; time-boxed development

    Agile enterprise data warehousing (EDW) is a software engineering approach for data analytic systems that borrows from many techniques, old and new. At its core lie agile techniques for general programming that were borrowed from two schools of incremental, iterative development. To make sense of agile programming for data warehousing, the reader will need an overview of the general techniques taken from each of these schools. This chapter provides an introduction to the first school, which consists of methods descending from the agile manifesto, most notably Scrum and Extreme Programming (XP). Chapter 3 provides a quick look at the other school, namely lean software development and Kanban, plus a distant ancestor to all iterative approaches used today, the Rational Unified Process (RUP). The mind map shown in Figure 2.1 illustrates how the presentation of Scrum, XP, lean, Kanban, and RUP is divided between Chapters 2 and 3. For those readers not yet acquainted with iterative and incremental programming techniques, the two chapters in this opening section of the book should serve as primer on the main methods and practices that agile has to offer, providing the background needed to understand the incremental approach to data warehouse development that will be presented later.

    Figure 2.1 Mind map of generic iterative methods summarized in Chapters 2 and 3.

    Because all of agile EDW’s ancestor methods have been well documented in other works, they are only summarized here. A couple of graphics will make these summaries easier to read. First, Figure 2.2 shows the family tree of methods and how they combined into the agile approach to data warehousing/business intelligence (DW/BI) proffered in this book. Second, Table 2.1 lists the primary components employed in agile EDW and documents the ancestor method in which they originated, although the exact origins of some were difficult to uncover completely.

    Figure 2.2 A family tree of methods and influences leading to the agile EDW method.

    Table 2.1

    Agile Elements by Origin

    Defining Agile

    Both traditional and agile approaches largely agree on the major steps and sequencing of activities that comprise disciplined software engineering: system requirements, application requirements, analysis, design, coding, testing, operations, and maintenance. Given the way manufacturing was organized in the mid-20th century, it was easy for project management to think that the work for each step should be finished completely before the development team moved on to the next, as if the project were simply a large automobile making its way along an assembly line. This traditional approach was clearly articulated in 1970 in a paper by TRW’s Winston Royce titled Managing the Development of Large Software Systems. It is often called the waterfall method because artifacts for each work step pool up until that step is complete and then cascade down into the next engineering activity, as shown in Figure 2.3. To be fair, Royce and other leading authors at the time were actually warning software developers against following this waterfall approach, urging information technology (IT) managers to either prototype heavily before programming or simply plan on throwing away the first version of an application:

    If the computer program in question is being developed for the first time, arrange matters so that the version finally delivered to the customer for operational deployment is actually the second version insofar as critical design/operations areas are concerned.

    [Royce 1970]

    Figure 2.3 The traditional waterfall method. Source: Adapted from [Royce 1970].

    Unfortunately, an approach exactly as depicted in Figure 2.3 was adopted into a 1985 U.S. military standard for systems development and then soon disseminated into the software industry by the military’s systems integration contractors [Department of Defense, 1985].

    By the mid-1990s, however, a radical alternative to the traditional approach to software development was in the air. The Standish Group had published two versions of its Chaos Report survey of 8380, development projects at 365 U.S. companies, revealing that the mainstream approach was failing more often than not to deliver projects on time, on budget, and with all their promised features. The Standish Group’s analysis revealed that only projects with very small scopes were achieving anything better than a 50% success rate [Standish Group 1995, 1999]. On a separate front, Japanese manufacturers had recently turned many traditional product engineering concepts on their heads and were decisively outcompeting their U.S. counterparts because they were able to introduce new products across a wide range of industries without having to invest in lengthy product design efforts and large

    Enjoying the preview?
    Page 1 of 1